The purpose of this article is to introduce you to SQL window functions and some of the high-level concepts. In SQL, a window function refers to a function, such as sum or average, which acts upon a result set’s rows relative to the current row.
There are a lot of details to cover, but you’ll see we cover those in later articles. For now, we’ll dig into the how window functions work and really get to know more about frames.
All the examples for this lesson are based on Microsoft SQL Server Management Studio and the AdventureWorks database. You can get started using these free tools using my Guide Getting Started Using SQL Server.
Introduction to SQL Window Functions
Window functions are, for the most part, familiar. What makes them special is they are set up to operate on a set of rows relative to the query processor’s current row. This set is called the window frame and it is set up using the OVER clause.
The OVER clause can get tricky, as you introduce ORDER BY and PARTITION BY parameters, but we’ll do this one at a time. By the end of this article you’ll see how each of these affects the window function calculation, and be ready to tackle other analytic functions and partition options.
The following diagram pulls together all the concepts we’ll go over in this article. As you read subsequent sections, I encourage you to come back to this illustration, so you understand how it fits into the overall picture.
SQL OVER Clause
As we mentioned before, it is the OVER clause that makes an aggregate a SQL window function. Here you’ll see a simple window function to calculate to total SalesYTD for all sales persons.
SELECT BusinessEntityID ,TerritoryID ,SalesYTD ,SUM(SalesYTD) OVER() as TotalSalesYTD FROM Sales.SalesPerson S
Which when run outputs:
Notice the TotalSalesYTD is the same value as if we had run the query
SELECT SUM(SalesYTD) FROM Sales.SalesPerson
In fact, if we wished, we could have created this same result using this subquery:
SELECT BusinessEntityID ,TerritoryID ,SalesYTD ,(SELECT SUM(SalesYTD) FROM Sales.SalesPerson) as TotalSalesYTD FROM Sales.SalesPerson S
However, you’ll see as we delve into SQL window functions, their capabilities for aggregating data quickly outpace what’s achieved using subqueries alone.
SQL OVER with PARTITION BY
The SQL PARTITION BY divides the rows into groups by which the window function calculates its result. The window function is computed across these rows.
It is similar to how the GROUP BY clause works; however, the result of the GROUP BY clause is a summary row. With a window function, the original row remains intact. The partition is logical, and used only for calculation purposes.
Here is a continuation of our previous example. In this case, we’ve partitioned the result by TerritoryID.
SELECT BusinessEntityID ,TerritoryID ,SalesYTD ,SUM(SalesYTD) OVER(PARTITION BY TerritoryID) as TotalSalesYTD FROM Sales.SalesPerson
Notice that TotalSalesYTD now represents a TotalSalesYTD for each Terrritory.
Could we have done this same query using a subquery? Sure! We could have used a correlated sub query similar to the one below
SELECT BusinessEntityID ,TerritoryID ,SalesYTD ,(SELECT SUM(SalesYTD) FROM Sales.SalesPerson WHERE COALESCE(TerritoryID, -1) = COALESCE(S.TerritoryID, -1) ) AS TotalSalesYTD FROM Sales.SalesPerson AS S ORDER BY TerritoryID;
But as you can see the subquery is becoming more cumbersome to write; whereas, the SQL window function is more elegant.
You may have figured this out, but if no partition is specified, then the entire result is considered the partition. This is important to know as we move on to the next topic, which has to do with how we order rows within the partition for calculation.
SQL OVER with ORDER BY
When the ORDER BY clause is used within the window, it sets up the order rows are processed within the partition. Keep in mind, that if no PARTITION BY is specified, the partition includes every row.
All rows from the start of the partition to the current row or rows equal to the current rows ORDER BY value are considered part of a window frame. Many window functions, such as SUM, operate within this frame, meaning, it becomes straight forward to setup running totals, averages, etc. according to the window’s ORDER BY.
Let’s look at an example, where we’ll calculate a running total or TotalSalesYTD by SalesPerson. We’ll do this by ordering the window by the sales person’s BusinessEntityID:
SELECT BusinessEntityID ,TerritoryID ,SalesYTD ,SUM(SalesYTD) OVER(ORDER BY BusinessEntityID) as TotalSalesYTD FROM Sales.SalesPerson
Here is the result
Notice the the TotalSalesYTD gets progressively larger and eventually equals the TotalSalesYTD previously calculated.
Sorting Window Function Results
So now you may be wondering what happens if we try to order the results by something different than BusinessEntityID?
Will ordering by TerritoryID ruin the running total calculation?
Actually it won’t! The reason being the SQL window function’s calculations occur during the SELECT phase of order of execution, which is before the ordering of the result. If you’re unsure of this, then check out this side-by-side comparison:
Notice, that though the results on the left are sorted differently than those on the right, the TotalSalesYTD corresponding to each BusinessEntityID remain the same. You can see where I point this out for two cases (see highlighted rows).
The key take away is that when SQL window functions perform their calculations they are doing so on a logical ordering of rows, which is independent of the final result. These rows are referred to as a frame, as they are a subset of the window.
Using Two Window Functions
Let’s continue the example by combining our window functions together into a single query. Below you’ll see where you have two window functions: one which computer the grand total, and another the running total:
SELECT BusinessEntityID ,TerritoryID ,SalesYTD ,SUM(SalesYTD) OVER(ORDER BY BusinessEntityID) as RunningTotalSalesYTD ,SUM(SalesYTD) OVER() as GrandTotalSalesYTD FROM Sales.SalesPerson ORDER BY BusinessEntityID
This illustrates that you can have two frames within the same query.
The RunningTotalSalesYTD’s frames is defined by the ordering of the current row via BusinessEntityID. The GrandTotalSalesYTD frame, since there is no ORDER BY, covers the entire partition.
Now that we have a better understanding of SQL window function concepts, let’s see how PARTITION BY and ORDER BY work together within the OVER clause.
SQL OVER with PARTITION BY and ORDER BY
Using PARTION BY and ORDER BY in concert within the OVER clause provides a powerful combination. Doing so, allows us to create frames to perform calculations such as, running totals, within groups.
Consider the query used to calculate a running total, by sales person, within each sales territory.
SELECT BusinessEntityID ,TerritoryID ,SalesYTD ,SUM(SalesYTD) OVER(PARTITION BY TerritoryID ORDER BY BusinessEntityID) as RunningTotalSalesYTDTerritory FROM Sales.SalesPerson
In the results you can see the running total resets when a new TerritoryID (partition) is encountered.
Let’s wrap up by going back over some of the key concepts.
SQL Window Function Key Concepts
Here are some of the key concepts we went over in this article. I know there was a lot covered, but it is important you understand the concepts and jargon before you move on.
OVER — This is your signal the preceding function is a window function. Items contained within the OVER clause, such as ORDER BY or PARTITION BY dictate how the window is framed. Meaning, how is the window function computed.
ORDER BY — it’s what makes a function become a running total. It determines the order the rows are processed to the function within the frame.
PARTITION – It is what makes the groupings. The partion can be one or more columns. One a new partition ends the window function result is reset.
Some observations:
If you don’t have an ORDER BY then you get a total not a running total.
You can have different windows in the same query. This makes it really easy to include running totals by day, month and year for example!
One key point to remember is the frame defined by the order by and partition by are completely separated from the sort order. It has nothing to do with the windows you define within your window functions. They don’t affect the results, as the result are first calculated.
Leave a Reply