Calculate Moving Median in SQL

·

·

Recently a student asked me how he could calculate a Moving Median.  It’s not as easy as you think, as SQL doesn’t provide a built-in MEDIAN() function, nor an easy way calculate the “moving” aspect.

In this article we’ll calculate a Moving median using SQL Server.  Let’s use the PizzaDB sample database for example data.

SQL Moving MEDIAN

Like a moving average, a moving median is based on a window of rows with “moves” or slide down the result set.  As you look at the diagram below, you’ll see we’re calculating a five-row moving median.

SQL Moving Median Illustration

In row Six we’re able to calculate the median using rows 6 through 2.  Given the Median, returns the “middle” value of a sample, the moving five row median at row six is 3.25.23.

Read More:  SQL MEDIAN in SQL Server

Plan of attack

Since there is no built-in function to calculate median we need to come up with another way.  Luckily, aw can use PERCENTILE_CONT() to do so.  We’ll use it to calculate the 50th percentile rank, which, happens to be the median.

You can see it work here.

Ok, so now that we know what function to use, the next big issue is how to create a “moving” median.  Unlike average, AVG(), we there isn’t a directly way to use ROWS PRECEEDING to create the moving window.

We’ll have to do this part by hand:

  1. Associate Row Numbers to each observation
  2. For each unique OrderDate, create a range of row numbers defining the window
  3. Combine the Observations with ranges.
  4. Calculate Mediate within each range.

In the end we’ll end up with a single query using several derived tables, it a lot to digest at once, so let break it down into steps.  By the end you’ll understand all the pieces and have a fully working query to use!

Step 1 –  Associate Row Numbers to each observation

To create windows, we’ll first create a row number for each observation.  In this query you can see how this is done.  Just use the row_number() window function.  Also, notice order the row numbers by OrderDate.

  select o.*, --observations
        row_number() over(order by OrderDate) RowNum
    from
    (
        select OrderDate, sum(FinalOrderPrice) TotalSales
        from CustomerOrderSummary
        group by OrderDate
    ) o
Observations with Row Numbers in SQL

Another part of the puzzle is to come up with “windows” or ranges of rows for each observation.  For instance, in our very first example our window started at 6 and we back to row 2.  In this follow query we create these windows.

select r.*  --moving ranges
    from
    (
        select d.*,
            row_number() over(order by OrderDate) - 4 RowStart, 
            row_number() over(order by OrderDate) RowEnd -- end of moving range
        from
        (
            select distinct OrderDate
            from CustomerOrderSummary
        ) d
    ) r
    where RowStart  > 0 -- only keep ranges that make sense
Ranges of Values in SQL

Now that we have these two pieces, we can combine them together to make our solution the idea we use the row ranges to “select” rows to calculate the median.   We’ll do this by using a non-equi join.

The idea is to join on Rownumber where the value is between the RowStart and RowEnd values.  Doing so gives us relevant observations to calculate the median.

The join looks a bit strange since we’re joining two derived tables!

The tables are aliased r and o for range and observation respectively.

If this seems a bit hard for you to get, just keep in mind the derived tables return results.  So think of use joining those results, think of them as being tables, if it makes it a bit easier for you.

Check out the following query to see how we get the data setup to start the calculations.

--data prepared
select r.*, o.*
from (
    select d.*,
        row_number() over(order by OrderDate) - 4 RowStart, -- start of moving range
        row_number() over(order by OrderDate) RowEnd        -- end of moving range
    from (
        select distinct OrderDate
        from CustomerOrderSummary
    ) d
) r
    inner join (
        select *, row_number() over(order by OrderDate) RowNum --observations
        from (
            select OrderDate, sum(FinalOrderPrice) TotalSales
            from CustomerOrderSummary
            group by OrderDate
        ) d
    ) o on o.RowNum between r.RowStart and r.RowEnd
where RowStart  > 0 -- only keep ranges that make sense

Notice how the observations repeat between RowStart and RowEnd ranges?

Ranges and Observations Ready for Moving Median in SQL

Let’s calculate the Mean.  To do this we’ll use the percentile_cont() window function.

I cover how this works in this cook book  article.

All the hard work is done.  All that remain is to setup percentile_cont() and partition it on r.OrderDate.

Recall that r.OrderDate stands for the “moving windows” we manually created.  By partitioning on these, we calculate the median

select distinct r.OrderDate,
percentile_cont(.5)  within group (order by TotalSales)
                                   over(partition by r.OrderDate) MedianSales
from (
    select d.*,
        row_number() over(order by OrderDate) - 4 RowStart, -- start of moving range
        row_number() over(order by OrderDate) RowEnd        -- end of moving range
    from (
        select distinct OrderDate
        from CustomerOrderSummary
    ) d
) r
    inner join (
        select *, row_number() over(order by OrderDate) RowNum --observations
        from (
            select OrderDate, sum(FinalOrderPrice) TotalSales
            from CustomerOrderSummary
            group by OrderDate
        ) d
    ) o on o.RowNum between r.RowStart and r.RowEnd
where RowStart  > 0 -- only keep ranges that make sense
SQL Moving Median Result

Related Articles:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from the blog


MySQL PostgreSQL SQLite SqlServer