The median problem

The median of a set of numbers is widely used in various industries, and as such it's an everyday problem. Calculating the middle value is tightly related to sorting. In this post I will show a solution to the median problem.

Problem statement

You decide to buy a house in a fancy suburb: Big homes, white fence and large trees. While preparing to home opens you want to know how much homes cost in the suburb. You did some research and learned that the last nine homes were sold at the following prices (in thousand dollars):

435, 850, 394, 465, 443, 490, 450, 425, 524

Calculate the median of the given home prices.

Background

If you are familiar with the concept of various averages including the median, feel free to skip the following section.

The mean

One can use different types of averages to describe the centre of a set of numbers. The most well-known of these methods is the mean, which is equal to the sum of the scores divided by the number of scores. Most people associate the average with the mean.

However, mean has a disadvantage that cannot be neglected: Outliers can skew the result. If the set of numbers contains an extremely high (or extremely low) value, the mean won’t represent the set properly anymore because the extreme value will increase the average. This is the reason why the largest and the smallest scores given by the judges are ignored in certain sports.

The median

Some industries (like real estate) use the median instead, which can better represent the overall characteristics of the values.

The median is the middle value in an ordered set of numbers, i.e. the same number of values should occur both on the left and the right of the median in the set.

Finding the median involves two steps: 1) Order the numbers. 2) Find the element in the middle.

If we have five numbers in ascending order, we will need the third one from the left and this value will be the median. Out of nine values the median will be the 5th one.

But what if we have to find the median of a set of even numbers? There won’t be any middle value!

This time we will take middle two numbers. In a set of six numbers we will need to take the 3rd and the 4th values and calculate their mean (i.e. add them together and divide the result by 2).

This way we can calculate the median of a set of any numbers.

Breaking it down

As always, it’s a good idea to break down the problem to building blocks.

Calculating the median involves two important steps. First, we need to order the values from smallest to greatest. Then we need to pick the middle element.

The first part seems to be easy but step back and think for a moment. Do we really need to sort every number? No, we don’t. If we could somehow divide the set into two partitions, we could see which partition contains the required element (in our example it’s the 5th smallest value). If we have this piece of information, we can decide which partition can be abandoned and which should be kept and worked with.

Of course, it’s not a big deal for nine numbers but it can be so if millions of numbers are sorted, so we will follow the abovementioned technique and will only sort the relevant part of the set.

Sorting

It turned out that finding the median is a sorting problem. I will apply the principles of the quicksort algorithm to create a solution.

Again, we could simply sort the numbers and then get the index of the middle value but it involves quite a bit of unnecessary operations.

JavaScript code

Let’s see how each step can be assigned some JavaScript code.

We can use an array to store home prices.

We will need the k-th smallest number in the array where k is the middle value. We will use quicksort, which was covered in one of the past articles.

The solution will be very similar to the one described in the linked post. We will need to find a pivot element and partition the array into three parts (left, equal and right). But then we will decide which partition to work with based on the length of those.

If the length of the left partition is greater than or equal to k (the position or index of the median), the median must be in the left partition. Similarly, if the length of the right partition is greater than or equal to k, the median value must be in the right block. If we are lucky enough to get both left and right partitions to be of equal length, the median will be the pivot value.

We need to repeat this process until we get one value (the median), which means that some recursion will be involved in the algorithm, similar to quicksort.

Solution

Let’s start with the prices array and the declaration of our function:

const prices = [435, 850, 394, 465, 443, 490, 450, 425, 524];
const medianPosition = Math.floor(prices.length / 2) + 1;

function getMedian(arr, med) {

}

We also define the position (index) of the median.

First we need to find a pivot element. We will use the first element in the array as it’s easy to work with. We also declare the partitions and start grouping the elements of the array into the partitions based on their value compared to the pivot.

This is the exact same process as described in the quicksort algorithm:

if (arr.length === 1) return arr[0];

let k = med;

const pivot = arr[0];
const left = [];
const right = [];
const equal = [pivot];

for (let i = 1; i < arr.length; i++) {
  if (arr[i] < pivot) {
    left.push(arr[i]);
  } else if (arr[i] > pivot) {
    right.push(arr[i]);
  } else {
    equal.push(arr[i]);
  }
}

The first line is important as this will end the recursion. When we have only one element left in any of the partitions, we will stop calling getMedian and return the element.

We also create the k variable, which is initially equal to the median position calculated above.

From this point on the solution will slightly differ from the quicksort algorithm. We need to find the k-th smallest element, so k needs to be compared to the length of the partitions.

Let’s see the code first:

const leftLength = left.length;
const equalLength = equal.length;

if (leftLength + equalLength === k) {
  return pivot;
} else if (leftLength + equalLength > k) {
  if (leftLength < k) return pivot;
  return getMedian(left, k);
} else {
  k = k - (leftLength + equalLength);
  return getMedian(right, k);
}

Because we need to find the k-th smallest element, we are only interested in the left and equal partitions as numbers increase from left to right. Based on their combined length and the value of k, we can observe three cases.

If leftLength + equalLength === k, i.e. the sum of the lengths of the left and equal partitions is exactly k, we have found the median and it will be the pivot element. Why? Because each element in left is less than the pivot and we will have k - 1 element that are less than the pivot, so the k-th smallest value will be pivot itself.

We get lucky here. We don’t even need to sort either left or right as we have found the median immediately. This case won’t occur very often though, so don’t bet the house on it.

What happens when the combined length of left and equal is greater than k? It means that the median will be in either in left or in equal. equal contains the pivot element, so if the length of left is less than k (the position of the median), the median must be in equal that is we need to return the pivot element again (same values can occur multiple times in the set). Otherwise the median must be in left and all we need to do is repeat the partitioning process on left.

The third scenario is when the median is in the right partition. In this case we need to redefine k as we are not interested in the k-th smallest number anymore on the new right array. In right the k-th smallest value has no sense, so the line

k = k - (leftLength + equalLength);

is supposed to make this correction. We subtract the combined length of left and equal from k and this will be the element we will be looking for in the right partition when we call getMedian again. It’s like starting over the partitioning process with right being the array of values and the new k being the index whose value we are interested in.

The full code looks like this:

function getMedian(arr, med) {
  if (arr.length === 1) return arr[0];

  let k = med;

  const pivot = arr[0];
  const left = [];
  const right = [];
  const equal = [pivot];

  for (let i = 1; i < arr.length; i++) {
    if (arr[i] < pivot) {
      left.push(arr[i]);
    } else if (arr[i] > pivot) {
      right.push(arr[i]);
    } else {
      equal.push(arr[i]);
    }
  }

  const leftLength = left.length;
  const equalLength = equal.length;

  if (leftLength + equalLength === k) {
    return pivot;
  } else if (leftLength + equalLength > k) {
    if (leftLength < k) return pivot;
    return getMedian(left, k);
  } else {
    k = k - (leftLength + equalLength);
    return getMedian(right, k);
  }
}

Let’s call getMedian with prices and medianPosition:

console.log(getMedian(prices, medianPosition)); // 450

We get that the median price is $450.000.

Considerations

What if we have even number of values?

One solution is to use the function we have just created and apply it on the two middle elements one by one.

We can check first if the length of the array is odd. If so, getMedian can be applied as is.

If the length is an even number, we will take the middle two values and call getMedian on each and calculate the mean of the results.

Thanks for reading and I hope that you liked this post. If so, see you next time.