SQL Percentile Rank Puzzle

·

·

, ,

In this puzzle, we’re going with with SQL Percentile Rank to work through a problem to identify the top and bottom student scores.

Solving puzzles is a great way to learn SQL.  Nothing beats practicing what you’ve learned.

SQL Puzzle Question

In today’s puzzle assume you’ve been asked by the head of the Mathematics department to provide him three lists of students ranked by score:

  1. Students whose scores are in the top 25%
  2. Students whose scores are in the bottom 25%
  3. Students in the “middle.”

The table you’ll query is named TestScore and contains the following fields:

  • StudentID (Primary KEy)
  • StudentName (Varchar)
  • Score

Practice the SQL percentile rank puzzle with this practice data.  Rather than creating a table, I declared a table variable.  You can add this into query window and run your queries from there:

-- TEST DATA
DECLARE @TestScore TABLE
(
                StudentID int,
                StudentName Varchar(40),
                Score float
)
Insert INTO @TestScore values (1, 'Han Solo', 98.0);
Insert INTO @TestScore values (2, 'The Fly', 92.0);
Insert INTO @TestScore values (3, 'Darth Vader', 83.0);
Insert INTO @TestScore values (4, 'Luke Skywalker', 78.0);
Insert INTO @TestScore values (5, 'Homer Simpson', 54.0);
Insert INTO @TestScore values (6, 'Porky Pig', 65.0);
Insert INTO @TestScore values (7, 'James T. Kirk', 91.0);
Insert INTO @TestScore values (8, 'Spock', 93.0);
Insert INTO @TestScore values (9, 'Batman', 85.0);
Insert INTO @TestScore values (10, 'Robin', 87.0);
Insert INTO @TestScore values (11, 'Superman', 94.0);
Insert INTO @TestScore values (12, 'Road Runner', 74.0);
Insert INTO @TestScore values (13, 'Wilie Coyote', 79.0);
Insert INTO @TestScore values (14, 'Ant Man', 82.0);
Insert INTO @TestScore values (15, 'Cool Hand Luke', 92.0);
Insert INTO @TestScore values (16, 'C3PO', 88.0);
Insert INTO @TestScore values (17, 'BB8', 92.0);
Insert INTO @TestScore values (18, 'Flash Gordon', 60.0);
Insert INTO @TestScore values (19, 'Bugs Bunny', 84.0);
Insert INTO @TestScore values (20, 'Gordon Geko', 75.0);

Can you provide the queries to get the answers to questions 1-3?

BONUS!  Combine the queries into a single result with the following columns:

  • Ranking
  • Student Name
  • Score

Where ranking would be one of three values:  Top25, Bottom25, Middle

Answer to Question 1

In order to answer the first question, take advantage of the TOP PERCENT clause.  When used with ordered data, the TOP PERCENT can be used to select students whose scores fall within the top 25 percent of all scores.

SELECT   TOP(25) PERCENT WITH TIES StudentName, Score
FROM     @TestScore
ORDER BY Score DESC

By ordering the data in descending order, the highest scores are listed first.   Also we use the WITH TIES clause to ensure that students with the same score as those in the top 25% aren’t left out.   In my testing, I found that if I didn’t do this, then BB8 wasn’t included.

Below are the results I got:

Answer to Question 2

To obtain this answer I used the same query, but this time I ordered the scores in Ascending order.

SELECT   TOP(25) PERCENT WITH TIES  StudentName, Score
FROM     @TestScore
ORDER BY Score ASC

This means the lowest scores are listed first and will be included in my TOP(25) PERCENT result.

Answer to Question 3

To get the middle, first combined the results from question 1 and 2.  I then use a subquery to exclude these students from my result.  What remained was the middle.

The combined results are colored coded green.

The subquery to exclude them is blue.

SELECT StudentName, Score
FROM   @TestScore
WHERE  StudentID NOT IN
       (SELECT   TOP(25) PERCENT WITH TIES StudentID
        FROM     @TestScore
        ORDER BY Score DESC
        UNION
        SELECT   TOP(25) PERCENT WITH TIES  StudentID
        FROM     @TestScore
        ORDER BY Score ASC
        )
ORDER BY Score DESC;

I could have also used the EXCEPT operator instead of a subquery as follows:

WITH TopAndBottomScore (StudentID, StudentName, Score)
AS
(
   SELECT   TOP(25) PERCENT WITH TIES StudentID, StudentName, Score
   FROM     @TestScore
   ORDER BY Score Desc
   UNION
   SELECT   TOP(25) PERCENT WITH TIES  StudentID, StudentName, Score
   FROM     @TestScore
   ORDER BY Score Asc
)
SELECT StudentName, Score
FROM   @TestScore
EXCEPT
SELECT StudentName, Score
FROM   TopAndBottomScore
ORDER BY Score DESC;

I used a Common Table Expression to contain the UNION of the top and bottom scores.

Then these scores were removed from all the scores using the EXCEPT operator.

Here are the results I got:

Answer to the Bonus Question

To answer the bonus question, I took an altogether different approach.  Rather than rely on TOP I took advantage of the CUME_DIST window function to calculate the relative position of scores.

The OVER clause specifies that the window is all students sorted by score.  Notice there is no PARTITION BY clause, just the ORDER BY.

WITH CumScore (StudentName, Score, CumScore)
AS
(
SELECT StudentName,
       Score,
       CUME_DIST () OVER (ORDER BY SCORE)
FROM   @TestScore
)
SELECT   CASE
            WHEN CumScore <= .25 THEN 'Bottom25'
            WHEN CumScore >= .75 THEN 'TOP25'
            ELSE 'Middle'
         END as Ranking,
         StudentName,
         Score
FROM     CumScore
ORDER BY Score;

To get the Ranking, we use a CASE statement to compare the returned CumScore.

Rank Student Scores using SQL Window Function
Table Of Contents

Add a header to begin generating the table of contents


More from the blog


MySQL PostgreSQL SQLite SqlServer