Use SQL to Find the MEDIAN

·

·

,

Problem

You need use SQL to calculate the Median of a result from SQL Server.

Background

Suppose you need to calculate the Median using SQL.

You can loosely define the median as the “middle” value of your data set.  If you were calculating the median by hand, you would use the following rules to do so:

1. When there is an odd number of rows, you can easily find the middle:

How to find the Median when there are odd number of values.

2. It’s a bit more complicated where there are even rows, as you need to computer the middle:

How to find the median when there are an even number of values.

There isn’t a built-in function to calculate the median, as there is average.  So instead, we we’ll resort to using a windows analytic function.

We’ll calculate the median Total Sales form our PizzaDB sample database.  Here is the base query we’ll use to get the sample results:

select OrderDate, sum(FinalOrderPrice) TotalSales
from CustomerOrderSummary
group by OrderDate

Solution

To find the median we’ll use the PERCENT_CONT() window function.  By calculating the 50 percent rank, we arrive at the median.  Let’s see how you can do this using the PizzaDB sample database:

Calculate SQL Median

select OrderDate, TotalSales,
    percentile_cont(.5)  within group (order by TotalSales) over() MedianSales
from (
    select OrderDate, sum(FinalOrderPrice) TotalSales
    from CustomerOrderSummary
    group by OrderDate
) d

Our sample has eight rows, so falls into the “even number of rows” case for calculating the median. You’ll see that the MedianSales is the average row 4 and 5 TotalSales: (255.53 _ 262.96) / 2 = 258.745, which is also the result percentil_cont(.5) achieves.

Discussion

Though there isn’t a built-in function to handle MEDIAN, we can use PERCENTILE_CONT().  There are some restrictions to the data types PERCENTILE_CONT() handles.  Avoid floating point.  Instead, use Numeric, Decimal, or Money types.

PERCENTILE_CONT() is preferred over PERCENTILE_RANK() since it calculates the ranking based on a continuous distribution. This allows for results not observed, such as the case where we have an even number of observations and need to interpolate the result.

Also, PERCENTILE_CONT() function’s OVER() clause is restricted.  Most importantly, you can’t use rows or range clauses within in it, meaning, it becomes very difficult to calculate a moving Median as we can do for averages.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from the blog


MySQL PostgreSQL SQLite SqlServer