Understanding SQL GROUP BY: A Guide to Data Summarization

The SQL Group BY clause is used to summarize your SQL results. Rather than displaying each row in detail, when GROUP BY is added to your query, one row per grouping is returned. Using groups comes in handy when you want to calculate totals.

For this tutorial we’ll use the CutomerOrderDetail view, which is shown below

CustomerOrder Table used for GROUP BY SQL examples

Suppose we want to know the total purchases by Customer. To get this result we can use the following query:

select CustomerName, sum(PurchaseAmount) TotalPurchases from CustomerOrderDetail group by CustomerName
/* Answer */
select CustomerName, sum(PurchaseAmount) TotalPurchases
from CustomerOrderDetail
group by CustomerName

For this query all rows within the CustomerOrderDetail view are processed; however, only one row per customer is returned. As rows are processed the PurchaseAmount is totaled. As new customer groups are processed to total is reset.

Note: A view is a pre-built query we can treat as a table with our queries. Read up on them in our article What is a View?

SQL GROUP BY Syntax

Here is a simplified SQL syntax incorporating GROUP BY into the SELECT statement.

SELECT Column1, Column2, ... AggregateFunction(Column3)
FROM TableName
WHERE condition
GROUP BY Column1
ORDER BY Column1

In this example

  • Column1, Column2, … represent the columns you which to group the result set. These columns must be part of the GROUP BY clause.
  • AggregateFunction – A function like COUNT(), SUM(), or AVG() which performs a calculation on a set of values.
  • TableName is the table you are querying
  • condition is used to filter rows in the result set
  • GROUP BY specifies the columns you which to group by.
  • ORDER BY allows you to sort the grouped fields.

Group by Two or More Columns

You can also group by one or more columns. Here we group on OrderDate and ProductName

select OrderDate, ProductName, Sum(PurchaseAmount) TotalPurchases from CustomerOrderDetail group by OrderDate, ProductName order by OrderDate, ProductName
/* Answer */
select OrderDate, ProductName, Sum(PurchaseAmount) TotalPurchases
from CustomerOrderDetail
group by OrderDate, ProductName
order by OrderDate, ProductName

There are couple of things to point out here. First notice how we group by OrderDate and ProductName. You see both columns specified in the SELECT and GROUP BY clauses. Second, we included an ORDER BY clause.

Using the GROUP By statement doesn’t guarantee order. Yes, you get unique values in the group, but to ensure your data is properly sorted, include the ORDER BY statement.

GROUP BY and SQL order of Execution

Now that we are starting to add more clauses to our SELECT statement, lets talk about how each clause is executed.

When a query is run, the SQL database does the following:

  1. Retrieve data for the table specified.
  2. Filter any rows per the where clause.
  3. Group the filtered result per the GROUP BY SQL clause
  4. Order the date per the ORDER BY clause.
  5. Display the selected columns.

I think it is important to understand this so you better understand how your queries work. For example understanding that data is filtered out prior to the group by statement lets you understand why Pan Pizza is missing from the following results:

select OrderDate, ProductName, Sum(PurchaseAmount) TotalPurchases, Count(CustomerOrderID) TotalOrders from CustomerOrderDetail where ProductName <> 'Pan Pizza' group by OrderDate, ProductName order by OrderDate, ProductName
/* Answer */
select OrderDate, ProductName, Sum(PurchaseAmount) TotalPurchases, Count(CustomerOrderID) TotalOrders
from CustomerOrderDetail
where ProductName <> 'Pan Pizza'
group by OrderDate, ProductName
order by OrderDate, ProductName

In this new lesson we’ll explore the various aggregate functions you can use along with GROUP BY.