Create a Cross Tab Query to Summarize Data in SQL Server

In this puzzle, we’re going to learn how to create a cross tab query using SQL Server.  Cross tabs are a great way to summarize data.  And given that Business Intelligence is a hot topic, knowing how to create one is key.

We’ll learn to create a cross tab query.  Before you read the entire article, try the puzzle.  Any work you do, if it you just get part of the answer, helps to reinforce the concepts you’ll learn.

Solving puzzles is a great way to learn SQL.  Nothing beats practicing what you’ve learned.  Once you have figured out the puzzle, post your answer in the comments so we all can learn from one another.  We also discuss puzzle and more in Essential SQL Learning Group on Facebook.  Be sure to find us there!

SQL Puzzle Question

The Adventure Works human resources director is starting a wage study.  As part of this study, she would like a summary of annual wages by Department Group name and Life Status.

For our purposes, Life status is defined the combination of marital status and gender (e.g. Married-Male, Married-Female).

Using the AdventureWorks database as the basis for your answer, what SQL would you write to product the following table?

Cross Tab Query Results

Note:  Here is a great resource if you need help getting started with SQL Server and the Adventure Works database.

Answer Using Cross Tab Query

Before we get into how to create the cross tab query, let’s first define the underlying query we’ll use as a source.

Once you study the adventure works database you’ll notice there are several key tables you’ll need to do your search.  To help understand the database relationships, let’s create a database diagram.

These diagrams really help you understand relationships, and come in handy when writing the queries.

Below is the one to use for this puzzle.  Notice how the Employee table is central to the query.  Related to this are two separate dimensions, the first being pay history, the second, employee department history.

Cross Tab Query Detail Rows

The first step in putting together the summarization is to construct the query used to pull in the detailed data.  After you study the diagram, you’ll see we’ll need several joins per the relationships shown.

To make it easier to read, I color coded the joins related to pay history in red, and those with department group name in blue.

SELECT E.BusinessEntityID,
       D.GroupName,
       H.EndDate,
       E.MaritalStatus,
       E.Gender,
       P.Rate,
       P.PayFrequency,
       P.RateChangeDate,
       P.Rate * 40 * 52 / P.PayFrequency as AnnualPay
FROM   HumanResources.Employee E
       INNER JOIN HumanResources.EmployeePayHistory P ON
                  E.BusinessEntityID = p.BusinessEntityID
       INNER JOIN HumanResources.EmployeeDepartmentHistory H ON
                  E.BusinessEntityID = H.BusinessEntityID
       INNER JOIN HumanResources.Department D ON
                  H.DepartmentID = D.DepartmentID
WHERE  E.CurrentFlag = 1;

There are several issues with this result.  First, the query is returning more than one row per employee.  To avoid double counting summary results, there should be a single row.

Cross Tab Query Detailed Results

This is due the query returning each employee’s complete pay and departmental history.  Multiple records, for one employee are included in the result – one for each pay raise or departmental move.

Removing Duplicates from Cross Tab Query Detail Results

The results should contain the latest pay and department for each employee.

The latest department, the once corresponding to the department history with a NULL end date.  Adding a filter to the join condition addresses this issue.

To only select the latest pay rate, we’ll partition the data by employee (BusinessEntityID) sorted by RateChangeDate.  With the data sorted in descending order, the latest pay is always ROW_NUMBER() = 1.

Wrapping this query in a CTE (Common Table Expression) allows us to query the detailed data and filter on the partitioned data.  This means we can select rows where ROW_NUMBER() = 1.

With cte_EmployeePay as
(
   SELECT E.BusinessEntityID,
          D.GroupName,
          H.EndDate,
          E.MaritalStatus,
          E.Gender,
          P.Rate,
          P.PayFrequency,
          P.RateChangeDate,
          P.Rate * 40 * 52 / P.PayFrequency as AnnualPay,
          ROW_NUMBER() OVER(PARTITION BY E.BusinessEntityID
                       ORDER BY P.RateChangeDate Desc) RowNumber
   FROM   HumanResources.Employee E
          INNER JOIN HumanResources.EmployeePayHistory P ON
                     E.BusinessEntityID = p.BusinessEntityID
          INNER JOIN HumanResources.EmployeeDepartmentHistory H ON
                     E.BusinessEntityID = H.BusinessEntityID
                     AND H.EndDate is NULL
          INNER JOIN HumanResources.Department D ON
                     H.DepartmentID = D.DepartmentID
   WHERE  E.CurrentFlag = 1
)
SELECT *
FROM   cte_EmployeePay
WHERE  RowNumber = 1;

Using a common table expression makes this easier to manage and we avoid having to create a view of the partitioned data.  Now our results show the latest pay and departmental assignments.

Cross Tab Query Detail Results with No Duplicates

Notice there is just a single row for BusinessEntityID 4.

The final step is to create a cross tab query.  I choose to create one using the PIVOT statement.

To prepare my data, I calculated the LifeStatus by combining MaritalStaus with Gender.

You see further on in the pivot statement I explicitly call out each combination.  This makes it a static pivot table.  In a later puzzle, we’ll learn how construct a dynamic pivot.

Using PIVOT to Create a Cross Tab Query

We’re now able to put together the summary data using a PIVOT.  To help you read the query, I’ve color coded the source query gray.

The PIVOT is color coded Blue.

With cte_EmployeePay as
(
   SELECT D.GroupName,
          IIF(E.MaritalStatus = 'M','Married','Single') + '-' + 
          IIF(E.Gender = 'M', 'Male', 'Female') as LifeStatus,
          P.Rate * 40 * 52 / P.PayFrequency as AnnualPay,
          ROW_NUMBER() OVER(PARTITION BY E.BusinessEntityID
                       ORDER BY P.RateChangeDate Desc) RowNumber
   FROM   HumanResources.Employee E
          INNER JOIN HumanResources.EmployeePayHistory P ON
                     E.BusinessEntityID = p.BusinessEntityID
          INNER JOIN HumanResources.EmployeeDepartmentHistory H ON
                     E.BusinessEntityID = H.BusinessEntityID
                     AND H.EndDate is NULL
          INNER JOIN HumanResources.Department D ON
                     H.DepartmentID = D.DepartmentID
   WHERE  E.CurrentFlag = 1
)
SELECT *
FROM   (SELECT GroupName,
               LifeStatus,
               AnnualPay
        FROM   cte_EmployeePay
        WHERE  RowNumber = 1) as SourceTable
PIVOT
(
   Avg(AnnualPay)
   FOR LifeStatus IN ([Married-Male],
                      [Married-Female],
                      [Single-Male],
                      [Single-Female])
) AS AvgSales
ORDER BY GroupName

 

Notice that the PIVOT operator takes the derived table, which is named SourceTable, as input and “rotates” the LifeStatus values from rows to columns.

 

 

 

 

Related Posts