SQL GROUP BY Guide

·

·

,
SQL GROUP BY

Use the SQL GROUP BY Clause is to consolidate like values into a single row.  The group is a set of columns. The group by returns a single row from one or more within the query having the same column values. Its main purpose is this work alongside functions, such as SUM or COUNT, and provide a means to summarize values.

What is a SQL Group?

In SQL, groups are unique column value combinations. When a query has a GROUP BY, rather than returning every row that meets the filter condition, values are first grouped together. The rows returned are the unique combinations within the columns.

The overall syntax for GROUP BY is:

SELECT   colum1, column2, column3, .. 
FROM     table1 
GROUP BY column1, column2, column3,… 

For the lesson, we’ll focus on the HumanResources.Employee table.

In this example I’ve ordered the table by JobTitle and Gender.

SQL GROUP BY Example Data

Here you can see the beginnings of the group. The final step is to add a SQL GROUP BY clause to the statement to create the grouping.

SELECT JobTitle, MaritalStatus, COUNT(*) NumberEmployees
FROM HumanResources.Employee
GROUP BY JobTitle, MaritalStatus
ORDER BY JobTitle, MaritalStatus

Whose result you see here:

Group By Statement Example with Count

I want you to notice several things:

I removed columns that were part of the group or summarization, such as COUNT(*). Only columns listed in the GROUP BY are allowed in the select. In addition to these, you able to also use aggregation functions, such as SUM, COUNT, MIN, MAX and AVG.

I kept the ORDER BY in the statement since GROUP BY doesn’t guarantee the groups are sorted.

Now that you have a good grasp how it works, let’s look some more items related to GROUP BY.

SQL GROUP BY with Sum

You can use these functions to summarize data within a group:

  • COUNT() – Returns number of rows within group.
  • MIN() – Return the minimum value found within the group.
  • MAX() – Returns the maximum value found within the group.
  • SUM() – Returns the sum of the value found with the group.
  • AVG() – Returns the average value found within the group (SUM() / COUNT() )

Let’s do some summarizations! For a change, we’ll summarize SalesOrderDetail records. For each Sales Order let’s calculate the The Order Total, the Average Sales Order Detail Line Total for a SalesOrder, and the number of SalesOrderDetail lines within an order.

We’ll use the a group by with SUM, AVG, and COUNT functions to help.

SELECT SalesOrderID     
     ,SUM(LineTotal) AS OrderTotal
     ,AVG(LineTotal) AS AverageLineTotal
     ,COUNT(LineTotal) AS NumberOfLines
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderIDORDER BY SalesOrderID;

Since there are one or more lines per Sales Order, we’ll group by the Sales Order ID. Check out the results:

Group by example with SUM, AVG, and COUNT

Here are some tips to think about when using these functions. You can:

  1. Use more than one function in your SQL.
  2. Include the functions within Expressions.
  3. Alias the expression.

You can also create new expressions using aggregrate functions. To demonstrate this, let’s calculate the Average Line Total knowing the Average = the Sum of Line Totals / The Count.

Here is the query to make it so:

SELECT SalesOrderID
     ,SUM(LineTotal) AS OrderTotal
     ,COUNT(LineTotal) AS NumberOfLines
     ,SUM(LineTotal) / COUNT(LineTotal) AS AverageLineTotal
     ,AVG(LineTotal) AverageFunctionLineTotal
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
ORDER BY SalesOrderID;

And the results. Notice how the “calculated” AverageLineTotal equal the Function’s value.

Group By Hand Calculate Average

Ordering Grouping Results

Groups aren’t returned in sorted order. In order to do this, you need to include the ORDER BY clause. Add the ORDER BY clause to the end of the SQL statement.

Modify this statement so it sorts the groups by SalesOrderID and CarrierTrackingNumber.

/*modify this statement*/ SELECT SalesOrderID, CarrierTrackingNumber, COUNT(1) as NumberofLines FROM Sales.SalesOrderDetail GROUP BY SalesOrderID, CarrierTrackingNumber;
/* Answer */
SELECT SalesOrderID, CarrierTrackingNumber, COUNT(1) as NumberofLines 
FROM Sales.SalesOrderDetail 
GROUP BY SalesOrderID, CarrierTrackingNumber 
ORDER BY SalesOrderID, CarrierTrackingNumber 

HAVING and GROUP BY

The HAVING clause filters groups created using GROUP BY.  The HAVING clause determines whether a group is included within the result.   

Unlike the WHERE clause which filters individual records, the HAVING clause filter groups; however, keep in mind that SQL containing both a WHERE and HAVING processes both! 

In this case, the WHERE clause is first processed, and those records filtered.  The HAVING is then applied to groups as specified in the GROUP BY. 

Typically HAVING is used with GROUP BY, but if a GROUP BY clause isn’t included, then HAVING operates on the entire result set as one implicit group. 

Consider our previous example. Here we calculate the OrderTotals.

SELECT SalesOrderID     
     ,SUM(LineTotal) AS OrderTotal
     ,AVG(LineTotal) AS AverageLineTotal
     ,COUNT(LineTotal) AS NumberOfLines
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
ORDER BY SalesOrderID;

Suppose we need to show large orders, those totaling more than $100,000.00. How can we do so?

Use the HAVING clause to filter on the SUM(LineTotal).

Here is the query to show large orders:

SELECT SalesOrderID     
     ,SUM(LineTotal) AS OrderTotal
     ,AVG(LineTotal) AS AverageLineTotal
     ,COUNT(LineTotal) AS NumberOfLines
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID
HAVING SUM(LineTotal) > 100000.00
ORDER BY SalesOrderID;

What’s nice about HAVIN

HAVING clause’s key advantage is its ability to filter GROUPS using aggregate functions.  This is something you cannot do withing a SELECT statement

Hint:  If you are having a hard time understand HAVING and WHERE’s role within SELECT, I recommend understanding the SELECT statement order of execution first.  

Want to learn more? Check out our SQL HAVING Tutorial

GROUP BY with Multiple Tables

GROUP BY can also include columns from multiple tables. In the following query let’s calculate the Sales by ProductName. To do so, well join SalesOrderDetail to Product. We’ll join on product.

SELECT p.Name ProductName,
    SUM(s.OrderQTY) TotalQuantitySold,
    SUM(s.LineTotal) TotalSales,
    SUM(s.OrderQTY * p.StandardCost) TotalStandardCost,
    SUM(s.LineTotal) - SUM(s.OrderQTY * p.StandardCost) GrossMargin
FROM Sales.SalesOrderDetail s
    INNER JOIN Production.Product p ON  s.ProductID = p.ProductID
GROUP BY p.Name
ORDER BY p.Name

Check out how columns from the two tables participate within the TotalStandardCost calculation.

Group By using Multiple Tables.

Sort by SUM()

Using the same example as above, notice I can sort by one of the totals. In this case we’ll sort by TotalQuantitySold.

SELECT p.Name ProductName,
    SUM(s.OrderQTY) TotalQuantitySold,
    SUM(s.LineTotal) TotalSales,
    SUM(s.OrderQTY * p.StandardCost) TotalStandardCost,
    SUM(s.LineTotal) - SUM(s.OrderQTY * p.StandardCost) GrossMargin
FROM Sales.SalesOrderDetail s
    INNER JOIN Production.Product p ON  s.ProductID = p.ProductID
GROUP BY p.Name
ORDER BY SUM(s.OrderQTY)

Since this column is aliased, let’s make the query easier to read, by sorting by the alias name.

GROUP BY Sort By SUM

GROUP BY with an Expression

You can group on an expression. Grouping is allowed as long as the same expression is within the SELECT list.

SELECT Year(OrderDate) OrderYear, Sum(TotalDue) YearlySales
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate)
ORDER BY OrderYear

In this example let’s group and summarize SalesOrder totals by year

Group by Expression

And there is nothing stopping us from grouping one or more expressions! Let’s further group sales by Month.

SELECT Year(OrderDate) OrderYear,
       Month(OrderDate) OrderMonth,
       SUM (TotalDue) MonthlySales
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate), MONTH(OrderDate)
ORDER BY OrderYear, OrderMonth

Before I forget, check out how I sorted the results. I used the column alias to order rather than repeating the Expression. Both are valid, but it is sure nice to keep our expression to a minimum! I think it’s easier to read. In case you’re wondering, this is all possible due to SQL Order of Operations.

Here are the results. I colored coded the columns to better show you the multiple expressions. Like the previous example, each expression used within the GROUB BY is also listed as a column.

SQL GROUP BY versus DISTINCT

In some ways GROUP BY is similar to using the DISTINCT Clause.

SELECT DISTINCT JobTitle, MaritalStatus
FROM HumanResources.Employee
ORDER BY JobTitle, MaritalStatus

Returns the same results as this SQL using SQL GROUP BY:

SELECT JobTitle, MaritalStatus
FROM HumanResources.Employee
GROUP BY JobTitle, MaritalStatus
ORDER BY JobTitle, MaritalStatus

The key difference between DISTINCT and GROUP BY is that when using GROUP BY, you can also summarize values.

Here, we included the COUNT() function to determine how many rows make up each group:

SELECT SalesOrderID, CarrierTrackingNumber, COUNT(1) as NumberofLines FROM Sales.SalesOrderDetail GROUP BY SalesOrderID, CarrierTrackingNumber;
/* Answer */
SELECT SalesOrderID, CarrierTrackingNumber, COUNT(1) as NumberofLines 
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID, CarrierTrackingNumber;

Conclusion

When using SQL GROUP BY, keep these points in mind:

  • The SQL GROUP BY does not guarantee the groups are returned in sorted order.
  • You can use aggregate functions, such as SUM and COUNT to summarize data.
  • The non summarized columns in your SELECT list must match those listed in the GROUP BY.

Additional Resources

If you’re looking to learn more about using the SQL GROUP BY examples, then I would recommend these resources:

More from the blog


MySQL PostgreSQL SQLite SqlServer