How to Find Median in Python?

How to Find Median in Python?

The middle value in a dataset is represented by the median, a statistical metric. When working with numerical data, it is a helpful statistic since, in contrast to the mean; it gives a more reliable indication of the central tendency.

Finding the median of a list or an array can be done in Python using a variety of methods. In this article, we will examine various approaches and show how to use them.

Method 1: Data Sorting

Sorting the data and then locating the middle element(s) are two simple methods for determining the median. Assume that the dataset is included in a list called data. The list may be sorted in ascending order using the sorted () function, and we can then utilise that order to retrieve the median.

data = [5, 2, 9, 1, 7, 6, 3, 8, 4]

sorted_data = sorted(data)

length = len(sorted_data)

if length % 2 == 0:

    median = (sorted_data[length // 2 - 1] + sorted_data[length // 2]) / 2

else:

    median = sorted_data[length // 2]

print("Median:", median)

Using the sorted() function, the data list is first sorted in this snippet of code. The length of the sorted list is then calculated. If the length is even, we choose the middle element outright; if not, we choose the average of the two middle elements.

Method 2: Utilising the Statistics Module

The statistics built-in module in Python includes a number of statistical operations, including determining the median. We need to import this module before calling the statistics.median() function and supplying the dataset as an argument.

import statistics

data = [5, 2, 9, 1, 7, 6, 3, 8, 4]

median = statistics.median(data)

print("Median:", median)

The sorting and median computation are handled internally by the median() function. Even with big datasets, it operates effectively and offers a clear, comprehensible solution.

Method 3: Utilizing Numpy

NumPy is a great option if you need to work with arrays or prefer to use a robust numerical computing toolkit. The numpy.median() method in NumPy may calculate the median value for an array.

import numpy as np

data = np.array([5, 2, 9, 1, 7, 6, 3, 8, 4])

median = np.median(data)

print("Median:", median)

We can take use of NumPy's array-oriented operations and optimised functions by turning the list into a NumPy array, producing code that is both effective and short.

Method 4: Individualised Application

You can create your own function if you'd rather have a unique implementation of median finding. A popular strategy is to employ the "Quickselect" algorithm, which is effective in locating the kth smallest element in an unordered list.

def find_median(data):

    if len(data) % 2 == 0:

        k = len(data) // 2

        return (quickselect(data, k - 1) + quickselect(data, k)) / 2

    else:

        k = len(data) // 2

        return quickselect(data, k)

def quickselect(data, k):

    if len(data) == 1:

        return data[0]

    pivot = data[len(data) // 2]

    lows = [x for x in data if x < pivot]

    highs = [x for x in data if x > pivot

When the dataset is very large or when the technique needs to be modified to meet certain criteria, a custom Quickselect implementation of calculating the median may be useful.

The dataset length, denoted by the letter n, determines the Quickselect algorithm's average time complexity. In the case of huge datasets, it follows that it is typically quicker than sorting the data.

When using the find_median() method, we first determine whether the dataset length is even or odd. If it is even, we use the Quickselect algorithm to find the kth element, where k is either the middle element or the average of the two middle elements, depending on whether the length is odd.

A pivot element is used by the recursive quickselect() function to divide the dataset into two subsets: one with values below the pivot and one with values above the pivot. Once we have located the kth element, we recursively call the function on the relevant subset.

An example of how to use the find_median() method is as follows:

data = [5, 2, 9, 1, 7, 6, 3, 8, 4, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

median = find_median(data)

print("Median:", median)