In this puzzle, we’re going with with SQL Percentile Rank to work through a problem to identify the top and bottom student scores.
Solving puzzles is a great way to learn SQL. Nothing beats practicing what you’ve learned.
SQL Puzzle Question
In today’s puzzle assume you’ve been asked by the head of the Mathematics department to provide him three lists of students ranked by score:
- Students whose scores are in the top 25%
- Students whose scores are in the bottom 25%
- Students in the “middle.”
The table you’ll query is named TestScore and contains the following fields:
- StudentID (Primary KEy)
- StudentName (Varchar)
- Score
Practice the SQL percentile rank puzzle with this practice data. Rather than creating a table, I declared a table variable. You can add this into query window and run your queries from there:
-- TEST DATA DECLARE @TestScore TABLE ( StudentID int, StudentName Varchar(40), Score float ) Insert INTO @TestScore values (1, 'Han Solo', 98.0); Insert INTO @TestScore values (2, 'The Fly', 92.0); Insert INTO @TestScore values (3, 'Darth Vader', 83.0); Insert INTO @TestScore values (4, 'Luke Skywalker', 78.0); Insert INTO @TestScore values (5, 'Homer Simpson', 54.0); Insert INTO @TestScore values (6, 'Porky Pig', 65.0); Insert INTO @TestScore values (7, 'James T. Kirk', 91.0); Insert INTO @TestScore values (8, 'Spock', 93.0); Insert INTO @TestScore values (9, 'Batman', 85.0); Insert INTO @TestScore values (10, 'Robin', 87.0); Insert INTO @TestScore values (11, 'Superman', 94.0); Insert INTO @TestScore values (12, 'Road Runner', 74.0); Insert INTO @TestScore values (13, 'Wilie Coyote', 79.0); Insert INTO @TestScore values (14, 'Ant Man', 82.0); Insert INTO @TestScore values (15, 'Cool Hand Luke', 92.0); Insert INTO @TestScore values (16, 'C3PO', 88.0); Insert INTO @TestScore values (17, 'BB8', 92.0); Insert INTO @TestScore values (18, 'Flash Gordon', 60.0); Insert INTO @TestScore values (19, 'Bugs Bunny', 84.0); Insert INTO @TestScore values (20, 'Gordon Geko', 75.0);
Can you provide the queries to get the answers to questions 1-3?
BONUS! Combine the queries into a single result with the following columns:
- Ranking
- Student Name
- Score
Where ranking would be one of three values: Top25, Bottom25, Middle
Answer to Question 1
In order to answer the first question, take advantage of the TOP PERCENT clause. When used with ordered data, the TOP PERCENT can be used to select students whose scores fall within the top 25 percent of all scores.
SELECT TOP(25) PERCENT WITH TIES StudentName, Score FROM @TestScore ORDER BY Score DESC
By ordering the data in descending order, the highest scores are listed first. Also we use the WITH TIES clause to ensure that students with the same score as those in the top 25% aren’t left out. In my testing, I found that if I didn’t do this, then BB8 wasn’t included.
Below are the results I got:
Answer to Question 2
To obtain this answer I used the same query, but this time I ordered the scores in Ascending order.
SELECT TOP(25) PERCENT WITH TIES StudentName, Score FROM @TestScore ORDER BY Score ASC
This means the lowest scores are listed first and will be included in my TOP(25) PERCENT result.
Answer to Question 3
To get the middle, first combined the results from question 1 and 2. I then use a subquery to exclude these students from my result. What remained was the middle.
The combined results are colored coded green.
The subquery to exclude them is blue.
SELECT StudentName, Score FROM @TestScore WHERE StudentID NOT IN (SELECT TOP(25) PERCENT WITH TIES StudentID FROM @TestScore ORDER BY Score DESC UNION SELECT TOP(25) PERCENT WITH TIES StudentID FROM @TestScore ORDER BY Score ASC ) ORDER BY Score DESC;
I could have also used the EXCEPT operator instead of a subquery as follows:
WITH TopAndBottomScore (StudentID, StudentName, Score) AS ( SELECT TOP(25) PERCENT WITH TIES StudentID, StudentName, Score FROM @TestScore ORDER BY Score Desc UNION SELECT TOP(25) PERCENT WITH TIES StudentID, StudentName, Score FROM @TestScore ORDER BY Score Asc ) SELECT StudentName, Score FROM @TestScore EXCEPT SELECT StudentName, Score FROM TopAndBottomScore ORDER BY Score DESC;
I used a Common Table Expression to contain the UNION of the top and bottom scores.
Then these scores were removed from all the scores using the EXCEPT operator.
Here are the results I got:
Answer to the Bonus Question
To answer the bonus question, I took an altogether different approach. Rather than rely on TOP I took advantage of the CUME_DIST window function to calculate the relative position of scores.
The OVER clause specifies that the window is all students sorted by score. Notice there is no PARTITION BY clause, just the ORDER BY.
WITH CumScore (StudentName, Score, CumScore)
AS
(
SELECT StudentName,
Score,
CUME_DIST () OVER (ORDER BY SCORE)
FROM @TestScore
)
SELECT CASE
WHEN CumScore <= .25 THEN 'Bottom25'
WHEN CumScore >= .75 THEN 'TOP25'
ELSE 'Middle'
END as Ranking,
StudentName,
Score
FROM CumScore
ORDER BY Score;
To get the Ranking, we use a CASE statement to compare the returned CumScore.