Calculate a Running Total in SQL with Joins or Window Functions
There are several ways to calculate a running total in SQL. In this article, we will cover two methods: Joins, and Window Functions.
We’ll first look at how to calculate the running total using an INNER JOIN. By doing so, you’ll not only learn more about join conditions, but see how to take the result and summarize it, to get the running total.
Once you’ve seen how to do it “old school,” we’ll use the OVER clause to calculate running totals using a window function. This method is newer and more concise to use.
All the examples for this lesson are based on Microsoft SQL Server Management Studio and the WideWorldImporters database. You can get started using these free tools using my Guide Get Started Using SQL Server 2016.
What is a Running Total?
Our goal is to calculate a running total that resets whenever the TransactionDate changes. We’ll total the TransactionAmount. For each subsequent invoice within the transaction date, the RunningTotal should equal the prior InvoiceID’s running total plus the current TransactionAmount.
You can see this in action in the following example. The Running Total for Invoice 3, is the prior RunningTotal of 3110.75 plus the Invoice 3’s Transaction Amount of 103.50
Calculate A Running Total in SQL using an INNER JOIN
We’ll first calculate the running total using INNER JOINS. This method reveals more of the mechanics of calculating a running total than using PARTITION. As such, it gives you another opportunity to understand INNER JOINS and apply those concepts to another use case.
We solve this problem in three steps:
- Get rows for the running total
- Setup details for the running total using inner joins
- Calculate the running total by summarizing data.
Let’s get started!
Step 1 – Get Rows for Running Total
To calculate the running total, we’ll query the CustomerTransactions table. We’ll include the InvoiceID, TransactionDate, and TransactionAmount in our result. Of course, the running total is calculated from the TransactionAmount.
Here is the query to get the basic data.
SELECT InvoiceID ,TransactionDate ,TransactionAmount FROM Sales.CustomerTransactions WHERE TransactionTypeID = 1 ORDER BY TransactionDate
Here is the data we’ll be working with.
This step really is meant to get you acquainted with the basic information. You don’t really need to do it, but I sometimes like to run the basic query just to see the data and make sure there aren’t any anomalies or special situations I need to accommodate.
Step 2 – Setup Details for Running Total using Inner Joins
In this step, we’ll get the details setup so we can calculate the running total. We’ll do this by getting, for each InvoiceID, the transaction amount and all transaction amounts before it.
To do this we’ll join the CustomerTransactions table to itself.
If we do this with no join condition we would get every combination of transactions, this is not what we want.
To ensure we get the proper combination of rows from each table, we’ll add two join conditions. One to get each invoice and those prior to it (Green).
The second ensures we only include invoices on the same transaction date (Red)
SELECT T1.InvoiceID ,T2.InvoiceID ,T1.TransactionDate ,T1.TransactionAmount ,T2.TransactionAmount FROM Sales.CustomerTransactions T1 INNER JOIN Sales.CustomerTransactions T2 ON T1.InvoiceID >= T2.InvoiceID AND T1.TransactionDate = T2.TransactionDate WHERE T1.TransactionTypeID = 1 ORDER BY T1.InvoiceID, T1.TransactionAmount
Let’s see how this operates.
The easiest condition to understand is where we match TransactionDate. This ensures the invoices match have a common transaction date. If this was the only join we did we would be calculating a sub total for all transactions within a date.
Since we want to calculate the running total, we need to somehow obtain for each InvoiceID the TransactionAmount for the invoice and all invoices before it. In other words, return all matching rows where the invoice is greater than or equal to the corresponding invoices we are trying to total.
If you look at the result above, you’ll see that for each invoice listed in the first column (T1.InvoiceID), It is greater than or equal to InvoiceID’s in the second column (T2.InvoiceID).
This is a result of the join condition T1.InvoiceID >= T2.InvoiceID.
The result of this join and the join conditions is that we now have the raw materials to calculate the running total.
Notice how the first, third, and fourth columns repeat. We can use this to our advantage to summarize the result to arrive at the running total.
Step 3 – Calculate Running Total by Summarizing Rows
With the detailed information at hand, the final step is to summarize the rows. Doing so allows us to calculate the running totals.
Here is the query we use to perform the summary:
SELECT T1.InvoiceID ,T1.TransactionDate ,T1.TransactionAmount ,Sum(T2.TransactionAmount) RunningTotal FROM Sales.CustomerTransactions T1 INNER JOIN Sales.CustomerTransactions T2 ON T1.InvoiceID >= T2.InvoiceID AND T1.TransactionDate = T2.TransactionDate WHERE T1.TransactionTypeID = 1 GROUP BY T1.InvoiceID ,T1.TransactionDate ,T1.TransactionAmount ORDER BY T1.InvoiceID ,T1.TransactionAmount
Notice how we group by T1.InvoiceID, T1.TransactionDate, and T1.TransactionAmount. These are the values which were repeated in our detailed data in step 2.
The Running Total is derived from T2.TransactionAmount. Recall these values are TransactionAmount from all Invoices prior to the invoice displayed. In other words, the invoice displayed is greater than or equal to them.
This allows us to build up a Running total.
Each subsequent invoice in the list, is calculating it’s RunningTotal value by summing up all TransactionAmount from its Invoice and those prior to it.
Now that you have seen a traditional way to arrive at the running total, and perhaps gained a greater appreciation of how to use joins and join conditions to solve it, let’s look at one of the newer features of SQL, partitions, and see how they can be used to achieve the same result.
Calculate A Running Total in SQL using an OVER Clause
The OVER clause if a very powerful statement. It allows you to define a set of rows, within a result set that an operation affects.
Much like OFFSET and FETCH allow us to retrieve a specific range of rows from a result set, the OVER clause allows us to do a similar operation, relative to the current row, for a specific column.
Using OVER, we can define a window over a specified set of rows, to which we can apply functions, such as sum.
For you to understand the concept we’ll break this down into two steps:
- Partition data using the OVER clause.
- Order Partitions with Order.
Step 1 – Partition data using OVER Clause
When we say we want to we want to create a running total for all Invoices within a TransactionDate, we want to partition our data by TransactionDate. To partition the data, we can use the over clause.
In the following statement notice we SUM the TransactionAmount and after the SUM there is an OVER clause.
Also notice there is no GROUP BY clause. This is surprising, typically aggregate functions, such as SUM, require a GROUP BY clause; why is this the case?
Since we are using the OVER clause, the SUM is considered a window function – it operates upon any rows defined in the OVER clause.
Here is the window function we’ll use:
SUM(TransactionAmount) OVER(PARTITION BY TransactionDate) RunningTotal
What make this a windows function is the OVER clause. checkout the part PARTITION BY TransactionDate. This means that the SUM operates on all rows with the same TransactionDate. This defines the window of rows the SUM function affects.
Here is the query so far.
SELECT InvoiceID ,TransactionDate ,TransactionAmount ,SUM(TransactionAmount) OVER(PARTITION BY TransactionDate) RunningTotal FROM Sales.CustomerTransactions T1 WHERE TransactionTypeID = 1 ORDER BY InvoiceID ,TransactionAmount
Step 2 – Order Partitions with Order BY
Up to this point we have partitioned the data and are able calculate a subtotal for all TransactionAmount values within a TransactionDate. The next step is to now calculate the subtotal.
To do this we can use ORDER BY within the OVER clause to define the “scope” of the window function. The ORDER BY specified the logical order the window function operates.
Here is the window function we’ll use:
SUM(TransactionAmount) OVER(PARTITION BY TransactionDate ORDER BY InvoiceID) RunningTotal
The difference between this window function and that from the first step, is ORDER BY InvoiceID. This specifies the logical order to process within the partition.
Without the ORDER BY the logical order is to wait until we are at the end of the window to calculate the sum. With the ORDER BY specified, the logical order is to calculate a sum for each row including previous TransactionAmount values within the window.
SELECT InvoiceID ,TransactionDate ,TransactionAmount ,SUM(TransactionAmount) OVER(PARTITION BY TransactionDate ORDER BY InvoiceID) RunningTotal FROM Sales.CustomerTransactions T1 WHERE TransactionTypeID = 1 ORDER BY InvoiceID ,TransactionAmount
Here is the result from running the query.
When you ran this query did you notice how much faster it ran than the one using INNER JOINS? I was surprised. I know the INNER JOIN operation consume a lot resources as the combinations or rows become large, but I would have thought it would have the same case for the solution using OVER.
I would encourage you to look at each query’s query plan. You’ll start to learn quite a bit about SQL when you start to do this.