Use the SQL GROUP BY Clause is to consolidate like values into a single row. The group is a set of columns. The group by returns a single row from one or more within the query having the same column values. Its main purpose is this work alongside functions, such as SUM or COUNT, and provide a means to summarize values.
Table of contents
What is a SQL Group?
In SQL, groups are unique column value combinations. When a query has a GROUP BY, rather than returning every row that meets the filter condition, values are first grouped together. The rows returned are the unique combinations within the columns.
The overall syntax for GROUP BY is:
SELECT colum1, column2, column3, .. FROM table1 GROUP BY column1, column2, column3,…
For the lesson, we’ll focus on the HumanResources.Employee table.
In this example I’ve ordered the table by JobTitle and Gender.
Here you can see the beginnings of the group. The final step is to add a SQL GROUP BY clause to the statement to create the grouping.
SELECT JobTitle, MaritalStatus, COUNT(*) NumberEmployees FROM HumanResources.Employee GROUP BY JobTitle, MaritalStatus ORDER BY JobTitle, MaritalStatus
Whose result you see here:
I want you to notice several things:
I removed columns that were part of the group or summarization, such as COUNT(*). Only columns listed in the GROUP BY are allowed in the select. In addition to these, you able to also use aggregation functions, such as SUM, COUNT, MIN, MAX and AVG.
I kept the ORDER BY in the statement since GROUP BY doesn’t guarantee the groups are sorted.
Now that you have a good grasp how it works, let’s look some more items related to GROUP BY.
Summarizing Results with SQL GROUP BY
You can use these functions to summarize data within a group:
- COUNT() – Returns number of rows within group.
- MIN() – Return the minimum value found within the group.
- MAX() – Returns the maximum value found within the group.
- SUM() – Returns the sum of the value found with the group.
- AVG() – Returns the average value found within the group (SUM() / COUNT() )
Let’s do some summarizations! For a change, we’ll summarize SalesOrderDetail records. For each Sales Order let’s calculate the The Order Total, the Average Sales Order Detail Line Total for a SalesOrder, and the number of SalesOrderDetail lines within an order.
We’ll use the SUM, AVG, and COUNT functions to help.
SELECT SalesOrderID ,SUM(LineTotal) AS OrderTotal ,AVG(LineTotal) AS AverageLineTotal ,COUNT(LineTotal) AS NumberOfLines FROM Sales.SalesOrderDetail GROUP BY SalesOrderIDORDER BY SalesOrderID;
Since there are one or more lines per Sales Order, we’ll group by the Sales Order ID. Check out the results:
Here are some tips to think about when using these functions. You can:
- Use more than one function in your SQL.
- Include the functions within Expressions.
- Alias the expression.
You can also create new expressions using aggregrate functions. To demonstrate this, let’s calculate the Average Line Total knowing the Average = the Sum of Line Totals / The Count.
Here is the query to make it so:
SELECT SalesOrderID ,SUM(LineTotal) AS OrderTotal ,COUNT(LineTotal) AS NumberOfLines ,SUM(LineTotal) / COUNT(LineTotal) AS AverageLineTotal ,AVG(LineTotal) AverageFunctionLineTotal FROM Sales.SalesOrderDetail GROUP BY SalesOrderID ORDER BY SalesOrderID;
And the results. Notice how the “calculated” AverageLineTotal equal the Function’s value.
Ordering Grouping Results
Groups aren’t returned in sorted order. In order to do this, you need to include the ORDER BY clause. Add the ORDER BY clause to the end of the SQL statement.
Modify this statement so it sorts the groups by SalesOrderID and CarrierTrackingNumber.
/* Answer */ SELECT SalesOrderID, CarrierTrackingNumber, COUNT(1) as NumberofLines FROM Sales.SalesOrderDetail GROUP BY SalesOrderID, CarrierTrackingNumber ORDER BY SalesOrderID, CarrierTrackingNumber
HAVING and GROUP BY
The HAVING clause filters groups created using GROUP BY. The HAVING clause determines whether a group is included within the result.
Unlike the WHERE clause which filters individual records, the HAVING clause filter groups; however, keep in mind that SQL containing both a WHERE and HAVING processes both!
In this case, the WHERE clause is first processed, and those records filtered. The HAVING is then applied to groups as specified in the GROUP BY.
Typically HAVING is used with GROUP BY, but if a GROUP BY clause isn’t included, then HAVING operates on the entire result set as one implicit group.
Consider our previous example. Here we calculate the OrderTotals.
SELECT SalesOrderID ,SUM(LineTotal) AS OrderTotal ,AVG(LineTotal) AS AverageLineTotal ,COUNT(LineTotal) AS NumberOfLines FROM Sales.SalesOrderDetail GROUP BY SalesOrderID ORDER BY SalesOrderID;
Suppose we need to show large orders, those totaling more than $100,000.00. How can we do so?
Use the HAVING clause to filter on the SUM(LineTotal).
Here is the query to show large orders:
SELECT SalesOrderID ,SUM(LineTotal) AS OrderTotal ,AVG(LineTotal) AS AverageLineTotal ,COUNT(LineTotal) AS NumberOfLines FROM Sales.SalesOrderDetail GROUP BY SalesOrderID HAVING SUM(LineTotal) > 100000.00 ORDER BY SalesOrderID;
What’s nice about HAVIN
HAVING clause’s key advantage is its ability to filter GROUPS using aggregate functions. This is something you cannot do withing a SELECT statement.
Hint: If you are having a hard time understand HAVING and WHERE’s role within SELECT, I recommend understanding the SELECT statement order of execution first.
Want to learn more? Check out our SQL HAVING Tutorial
GROUP BY with Multiple Tables
GROUP BY can also include columns from multiple tables. In the following query let’s calculate the Sales by ProductName. To do so, well join SalesOrderDetail to Product. We’ll join on product.
SELECT p.Name ProductName, SUM(s.OrderQTY) TotalQuantitySold, SUM(s.LineTotal) TotalSales, SUM(s.OrderQTY * p.StandardCost) TotalStandardCost, SUM(s.LineTotal) - SUM(s.OrderQTY * p.StandardCost) GrossMargin FROM Sales.SalesOrderDetail s INNER JOIN Production.Product p ON s.ProductID = p.ProductID GROUP BY p.Name ORDER BY p.Name
Check out how columns from the two tables participate within the TotalStandardCost calculation.
Sort by SUM()
Using the same example as above, notice I can sort by one of the totals. In this case we’ll sort by TotalQuantitySold.
SELECT p.Name ProductName, SUM(s.OrderQTY) TotalQuantitySold, SUM(s.LineTotal) TotalSales, SUM(s.OrderQTY * p.StandardCost) TotalStandardCost, SUM(s.LineTotal) - SUM(s.OrderQTY * p.StandardCost) GrossMargin FROM Sales.SalesOrderDetail s INNER JOIN Production.Product p ON s.ProductID = p.ProductID GROUP BY p.Name ORDER BY SUM(s.OrderQTY)
Since this column is aliased, let’s make the query easier to read, by sorting by the alias name.
GROUP BY with an Expression
You can group on an expression. Grouping is allowed as long as the same expression is within the SELECT list.
SELECT Year(OrderDate) OrderYear, Sum(TotalDue) YearlySales FROM Sales.SalesOrderHeader GROUP BY YEAR(OrderDate) ORDER BY OrderYear
In this example let’s group and summarize SalesOrder totals by year
And there is nothing stopping us from grouping one or more expressions! Let’s further group sales by Month.
SELECT Year(OrderDate) OrderYear, Month(OrderDate) OrderMonth, SUM (TotalDue) MonthlySales FROM Sales.SalesOrderHeader GROUP BY YEAR(OrderDate), MONTH(OrderDate) ORDER BY OrderYear, OrderMonth
Before I forget, check out how I sorted the results. I used the column alias to order rather than repeating the Expression. Both are valid, but it is sure nice to keep our expression to a minimum! I think it’s easier to read. In case you’re wondering, this is all possible due to SQL Order of Operations.
Here are the results. I colored coded the columns to better show you the multiple expressions. Like the previous example, each expression used within the GROUB BY is also listed as a column.
SQL GROUP BY versus DISTINCT
In some ways GROUP BY is similar to using the DISTINCT Clause.
SELECT DISTINCT JobTitle, MaritalStatus FROM HumanResources.Employee ORDER BY JobTitle, MaritalStatus
Returns the same results as this SQL using SQL GROUP BY:
SELECT JobTitle, MaritalStatus FROM HumanResources.Employee GROUP BY JobTitle, MaritalStatus ORDER BY JobTitle, MaritalStatus
Here, we included the COUNT() function to determine how many rows make up each group:
/* Answer */ SELECT SalesOrderID, CarrierTrackingNumber, COUNT(1) as NumberofLines FROM Sales.SalesOrderDetail GROUP BY SalesOrderID, CarrierTrackingNumber;
When using SQL GROUP BY, keep these points in mind:
- The SQL GROUP BY does not guarantee the groups are returned in sorted order.
- You can use aggregate functions, such as SUM and COUNT to summarize data.
- The non summarized columns in your SELECT list must match those listed in the GROUP BY.
If you’re looking to learn more about using the SQL GROUP BY examples, then I would recommend these resources: