# How to Group and Summarize your Results (simple explanation)

In today’s lesson, you’re going to learn about grouping results returned from your queries using the GROUP BY clause.

The objectives of today’s lesson are to:

• Learn how to group results using GROUP BY
• Use aggregate functions to perform calculations
• Understand how to filter groups using the HAVING clause

## GROUP BY Clause

The SQL GROUP BY Clause is used to output a row across specified column values.  It is typically used in conjunction with aggregate functions such as SUM or Count to summarize values.  In SQL groups are unique combinations of fields.  Rather than returning every row in a table, when values are grouped, only the unique combinations are returned.

The GROUP BY Clause is added to the SQL Statement after the WHERE Clause.  Here is an example where we are listing OrderID, excluding quantities greater than 100.

```SELECT OrderID
FROM OrderDetails
WHERE Quantity <= 100
GROUP BY OrderID;```

There are a couple of things to note.  First, the columns we want to summarize are listed, separated by commas, in the GROUP BY clause.  Second, this same list of columns must be listed in the select statement; otherwise the statement fails.

When this statement is run, not every filtered row is returned.  Only unique combinations of OrderID are included in the result.  This statement is very similar to

```SELECT DISTINCT OrderID
FROM OrderDetails
WHERE Quantity <= 100;```

But there is a key difference.  The DISTINCT modifier stops at outputting a unique combination of rows, whereas, with the GROUP BY statement, we can calculate values based on the underlying filtered rows for each unique combination.

In other words, using our example, with the GROUP BY, we can calculate the number or OrderDetails per order as follows:

```SELECT OrderID, COUNT(OrderID) as NumOrderDetails
FROM OrderDetails
GROUP BY OrderID;```

COUNT is an example of an aggregate function, these are what really give the GROUP BY statement its special value.

## Aggregate Functions

Some functions, such as SUM, are used to perform calculations on a group of rows, these are called aggregate functions.  In most cases, these functions operate on a group of values that are defined using the GROUP BY clause.  When there isn’t a GROUP BY clause, it is generally understood the aggregate function applies to all filtered results.

Some of the most common aggregate functions include:

 AVG(expression) Calculate the average of the expression. COUNT(expression) Count occurrences of non-null values returned by the expression. COUNT(*) Count all rows in the specified table. MIN(expression) Finds the minimum expression value. MAX(expression) Finds the maximum expression value. SUM(expression) Calculate the sum of the expression.

These functions can be used on their own in conjunction with the GROUP BY clause.  On their own, they operate across the entire table; however, when used with GROUP BY, their calculations are “reset” each time the grouping changes.  In this manner, they act as subtotals.

### General Syntax of an Aggregate Function

When using the aggregate function you can either compute the result on all values or distinct values.  For instance, to count all OrderDetails records we could use the expression:

```SELECT COUNT(OrderID)
FROM OrderDetails;```

To count the distinct of orders making up the details we would use the following:

```SELECT COUNT(DISTINCT OrderID)
FROM OrderDetails;```

## Using Aggregate Functions with GROUP BY

### AVG and SUM

The SUM function totals up the values returned, in similar fashion AVG calculates the average.

Let’s see if we can calculate the total order amount from the OrderDetails.  From previous lessons, we know how to calculate the total amount for each detail as:

```SELECT OrderID, ProductID, UnitPrice * Quantity as TotalPrice
FROM OrderDetails;```

Since we can apply an aggregate function to expressions, we can set up a grouping on OrderID to calculate the total price per order as

```SELECT OrderID, SUM(UnitPrice * Quantity) as TotalPrice
FROM OrderDetails
GROUP BY OrderID;```

We can even sort by the total to get the top orders first

```SELECT OrderID, SUM(UnitPrice * Quantity) as TotalPrice
FROM OrderDetails
GROUP BY OrderID
ORDER BY TotalPrice DESC;```

In a similar fashion, we can calculate the average order detail amount as

```SELECT OrderID,
AVG(UnitPrice * Quantity) as AverageOrderAmount
FROM OrderDetails
GROUP BY OrderID;```

For the curious, since an average is calculated as the sum of the sample divided by the sample count, then using AVG in the above statement is the same as:

```SELECT OrderID,
SUM(UnitPrice * Quantity) / COUNT(OrderID)  as AverageOrderAmount
FROM OrderDetails
GROUP BY OrderID;```

We covered a lot in this section.  Here are some key points to remember:

1. An aggregate function can evaluate an expression such as SUM(A + B)
2. You should alias aggregate functions, so the column names are meaningful
3. When working with aggregate functions and GROUP BY, it is sometimes is easier to think about the details first, that is writing a simple SELECT statement, inspect the results, then add in the fancy stuff.

### COUNT

The COUNT function is used when you need to know how many records exist in a table or within a group.  COUNT(*) will count every record in the grouping; whereas COUNT(expression) counts every record where expression’s result isn’t null.  You can also use Distinct with COUNT to find the number of unique values within a group.

To find the number of OrderDetail Lines per order

```SELECT OrderID, COUNT(OrderDetailID)
FROM OrderDetails
GROUP BY OrderID;```

To find the number of unique orders per product

```SELECT ProductID, COUNT(DISTINCT OrderID)
FROM OrderDetails
GROUP BY ProductID;```

### MIN and MAX

Use MIN and MAX to find the smallest and largest values, respectively, within a table or group.

For example, to find the smallest and largest product quantities ordered within an order try

```SELECT OrderID,
MIN(Quantity) as MinQuantity,
MAX(Quantity) as MaxQuantity
FROM OrderDetails
GROUP BY OrderID;```

You can also find the MIN or MAX value of a calculation.  Here we find the highest product amount ordered within a product:

```SELECT OrderID,
MAX(UnitPrice * Quantity) as MaxAmount
FROM OrderDetails
GROUP BY OrderID;```

## HAVING Clause

The HAVING clause is used to filter groups according to the results of the aggregate functions.  This makes it possible to solve problems such as select all orders that have more than two order detail lines.

That example looks like

```SELECT OrderID, COUNT(OrderDetailID)
FROM OrderDetails
GROUP BY OrderID
HAVING COUNT(OrderDetailID) > 2;```

If we wanted to find all orders greater than \$1000 we would write

```SELECT OrderID,
SUM(UnitPrice * Quantity) as TotalPrice
FROM OrderDetails
GROUP BY OrderID
HAVING TotalPrice > 1000
ORDER BY TotalPrice DESC;```

This query is the same as one from the previous section with the addition of the HAVING clause.  We could have written the HAVING clause as

`HAVING SUM(UnitPrice * Quantity) > 1000`

To hammer home HAVING, I want to show one last example.  Here you’ll see the HAVING statement includes an aggregate function that isn’t in the SELECT list.

```SELECT OrderID,
SUM(UnitPrice * Quantity) as TotalPrice
FROM OrderDetails
GROUP BY OrderID
HAVING AVG(UnitPrice * Quantity) > 500
ORDER BY TotalPrice DESC;```

In the above query, we’re getting the total price for orders where the average OrderDetail amount is greater than \$500.00.

Though they perform a similar function, there is a key distinction between the WHERE clause and HAVING. The WHERE clause filter individual records; whereas, the HAVING clause filters on the groups.

To keep it straight in my head I like to think of the WHERE clause doing its work before any groupings take place, and then the HAVING clause taking over after the groups are formed.

### Exercises

It’s important to practice! Use the sample database to answer these questions.

1. What is the average quantity ordered in the OrderDetails table?
2. Display the Min, Max, and Average Quantity ordered for each product in OrderDetails.
3. Return total sales, by product for all orders, but only include products included on 7 or more OrderDetails.

Congratulations!  You just learned how to use the GROUP BY and HAVING clauses to summarize and filter on summarized information.  More tutorials are to follow! Remember!  I want to remind you all that if you have other questions you want to be answered, then post a comment or tweet me.

#### Kris Wenzel

Kris Wenzel has been working with databases over the past 28 years as a developer, analyst, and DBA. He has a BSE in Computer Engineering from the University of Michigan and a MBA from the University of Notre Dame. Kris has written hundreds of blog articles and many online courses. He loves helping others learn SQL.

• Hello, Kris! Your stuff is really awesome. But I have a question: If I have 2 tables, lets suppose: table1 (id, name, dateLog, course_id) and table2(id_course, courseName), how can I do the query to show all names with your dateLog, courseName, without repeat the names? This will look something like this:

Tableresult:
name | courseName | dateLog … <–header
John | Math | jan-01 | jan – 02 | jan-03 … jan-31
John | Chem | jan-01 | jan – 02 | jan-03 … jan-31
Mary | Math | jan-01 | jan – 02 | jan-03 … jan-31
Mary | Chem | jan-01 | jan – 02 | jan-03 … jan-31

if the table1 is filled with:

id | name | id_course | dateLog
|55| John | 1 | jan-01
|55| John | 1 | jan-02
|55| John |1 | jan-03
….
|55| John | 1| jan-31

|55| John | 2 | jan-01
|55| John | 2| jan-02
|55| John | 2 | jan-03
….
|55| John | 2 | jan-31

|56| Mary | 1 | jan-01
|56| Mary | 1 | jan-02
|56| Mary | 1 | jan-03
….
|56| Mary | 1 | jan-31

|56| Mary | 2 | jan-01
|56| Mary | 2| jan-02
|56| Mary | 2| jan-03
….
|56| Mary | 2 | jan-31

And Table2 is filled with:

id_course | courseName
| 1 | Math
| 2 | Chem

This is a example that I'm try solve. My real tables are bigger, but with this small example will be easy to understand.

Can you help me?

Thanks!

• Hi Priss,

What you are looking to do is called a crosstab. You’re aiming to take the various datelog values and “create” columns for them. You can use the PIVOT keyword to do this; however, you would have to specify each date as a column. This may be an issue.

Another way to approach the problem is to use a UDF (User Defined Function). You could create a UDF, that accepts ID and id_course as a parameters, then searches for all course datelogs and concatenates them into a varchar value. The result would be three fields:
Name, CourseName, DateLogs

Kris.

• Hello, Kris!
I will try this way. The first link is very elucidate.

In some research, I came in this behavior called Pivot. But I did not understand his logic and its construction, because people do many mixtures of concepts and codes. This left all confused. But this article is clean and I think I can adapt it to what I need.

My database is MySQL and the Program Language is PHP with Grocery CRUD [CodeIgniter].

Thanks for the fast answer. :)

• Haider says:

Hi Chris,
Can you answer the last question by putting the filtering in the where clause
ie

where count(productid) > = 7 instead of the having?

• Hi,

If you try that the statement will error. Since COUNT is an aggregate function, it is not allowed in the WHERE clause. Aggregate functions are only allowed in the HAVING clause. Here is a good way to keep it straight:
1. Use the WHERE clause to filter records. That is to filter row-by-row.
2. Use the HAVING clause to filer the result of a grouping. In our case we’re filtering to out any groups that don’t have more than 7 orderdetail lines.

• Hi Kris

I have the table below:

# bcolor fcolor
———————-
1 blue red
2 red red
3 red red
4 red null
5 null red
6 red green
7 red blue
8 green red
9 green blue
10 green green
11 blue red
12 blue green
13 blue blue
14 green red
15 orange red
18 orange pink
19 orange white
20 orange yellow
16 orange null
17 orange null
21 green red
22 orange red
23 orange null
24 orange null
25 orange pink
26 orange white
27 orange yellow
28 red brown
29 red yellow
30 red violet
31 red black
32 brown violet
33 brown red
34 brown black

I want to know the following:
1. a count of each distinct color in bcolor column (color group) and
2. a total of each fcolor in a bcolor group.

I can achieve No 1 above using the SQL statement:

SELECT
DISTINCT bcolor,
count(bcolor) as bcolor_count
FROM
t1
GROUP BY bcolor

which results in output below:

bcolor count
“red” ;9
“green” ;5
“blue” ;4
“” ;0
“orange” ;12
“brown” ;3

what SQL can I use to achieve the table below:

bcolor count red null green blue browm yellow violet black pink blank white
“red” ;9 2 1 1 1 1 1 1 1 0 0 0
“green” ;5 3 0 1 1 0 0 0 0 0 0 0
“blue” ;4 2 0 1 1 0 0 0 0 0 0 0
“” ;0
“orange” ;12 2 2 0 0 0 2 0 0 2 2 2
“brown” ;3 1 0 0 0 0 0 1 1 0 0 0

count column is the total on all columns on the right.

• Aayush Desai says:

Hello,
Following is the table
order_id cutomer_id order_date product_id units
01 C1 01-Jan-2000 p1 10
02 C2 01-Jan-2000 p2 15
03 C3 01-Jan-2000 p3 17
04 C4 01-Jan-2000 p1 20
05 C4 01-Jan-2000 p2 1
06 C1 01-Jan-2000 p5 7

Is it possible to write a single query per question for the 3 questions below?

1.) Which product has the least number of orders?
2.)Which product is sold the most and what is the quantity?
3.)Give information about products having the highest orders?

It is very easy using VIEW but I was wondering if there is a way we can get the answer in single query for each question i.e total 3 queries

• Philip van Gass says:

Hi Kris. I am using the following query to give me the average quantity per sales order and although it rounds up the values to 2 decimal places as I wanted, it also adds on a whole bunch of zeroes after the second decimal place. How do I get rid of them ?

SELECT COUNT(SalesOrderID) AS NumOrders, SUM(OrderQty) AS TotalOrderQty, ROUND((CAST(SUM(OrderQty) AS numeric(5,2)))/COUNT(SalesOrderID), 2) AS AverageQty
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID

• Nicholas Boyd says:

I can’t find this anywhere. How do I use the SUM function to add up all like results. Say my results are 1000 rows with 100 different items under the items column. Each row also has a quantity column and each row has a different quantity in it. Item 123 has 20 lines with all different quantities. I need the total quantity for Item 123.