Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Slideshows
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Sep 26, 2008

SqlCredit - Part 19: Exploring SQL 2005's Ranking Functions - RANK() and DENSE_RANK()

By Rob Garrison

This is part 19 of a series. If you have not read part 18, you can find it here, but this article does not require any knowledge of the previous work in the series.

Please ... I Can Read Books Online

There are many articles and blogs that discuss SQL Server 2005’s ranking functions (RANK(), DENSE_RANK(), NTILE(), and ROW_NUMBER()). I will try to cover a lot more here than you will find in a quick read of Books Online.

Table and Data for Illustration

To illustrate these functions, I will be using a simple table that stores the results of a vegetable-growing contest. I don’t know if this kind of contest happens outside the US, but in the rural United States, the grower that has the biggest pumpkin at the county fair is a real celebrity. The biggest pumpkin here is 716 pounds, or 325 kg!

Here is the table:

    CREATE TABLE dbo.ContestResults (
        ColID       int             NOT NULL    IDENTITY,
        Category    varchar(10)     NOT NULL,
        Weight      int             NOT NULL,
        Entrant     nvarchar(20)    NOT NULL,
        CONSTRAINT PK_ContestResults PRIMARY KEY CLUSTERED (ColID)
    )

There are three categories: pumpkins, squash, and watermelon. Most of this is based on real data I found on the Internet, but I made up some of it to help explore these functions. (Do you recognize the last five watermelon growers?)

Here is the data:

ColID

Category

Weight

Entrant

1

Pumpkin

716

Chad Johnson

2

Pumpkin

679

George Kopsell

3

Pumpkin

679

Dan Gardner

4

Pumpkin

481

John Suydam

5

Pumpkin

452

Mark Bardin

6

Pumpkin

442

Bill Kallas

7

Pumpkin

428

Theresa Helmer

8

Pumpkin

426

Terry Helmer

9

Pumpkin

346

Gary Spiel

10

Pumpkin

331

Kevin Rabell

11

Pumpkin

289

Jan Spiel

12

Pumpkin

247

Harvey Zale

13

Pumpkin

229

Harvey Zale

14

Squash

462

Dan Gardner

15

Squash

462

Harvey Zale

16

Squash

435

Terry Helmer

17

Squash

405

Gary Spiel

18

Watermelon

146

Mark Bardin

19

Watermelon

139

Christine Daaé

20

Watermelon

139

Carlotta Giudicelli

21

Watermelon

132

Ubaldo Piangi

22

Watermelon

132

Meg Giry

23

Watermelon

129

Joseph Buquet

The code to build the table is here.

We have been asked to decide who gets first-, second-, and third-place prizes in this contest. We will use ranking functions to make these decisions.

RANK() and DENSE_RANK()

If we were asked to give out prizes, the RANK() and DENSE_RANK() functions would seem to be a reasonable choice. First, let’s find the simple ranking for the whole set.

RANK() and DENSE_RANK() with ORDER BY

Query:

   SELECT
        Category,
        Weight,
        Entrant,
        RANK()       OVER (
            ORDER BY Weight DESC
        ) AS [Rank],
        DENSE_RANK() OVER (
            ORDER BY Weight DESC
        ) AS DenseRank
    FROM dbo.ContestResults

Partial Results:

    Category   Weight Entrant              Rank DenseRank
    ---------- ------ -------------------- ---- ---------
    Pumpkin    716    Chad Johnson         1    1
    Pumpkin    679    George Kopsell       2    2
    Pumpkin    679    Dan Gardner          2    2
    Pumpkin    481    John Suydam          4    3
    Squash     462    Dan Gardner          5    4
    Squash     462    Harvey Zale          5    4
    ...
    Watermelon 139    Christine Daaé       19   17
    Watermelon 139    Carlotta Giudicelli  19   17
    Watermelon 132    Ubaldo Piangi        21   18
    Watermelon 132    Meg Giry             21   18
    Watermelon 129    Joseph Buquet        23   19

Notice that RANK() gives us a low of 1 and high of 23 (matching the number of records). If RANK() is over a column where the last value is duplicated, the last Rank number will not match number of records.

These results are interesting, but since we have multiple categories, this does not give us the answers we need.

RANK() and DENSE_RANK() with PARTITION BY and ORDER BY

Query:

    SELECT
        Category,
        Weight,
        Entrant,
        RANK()       OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS [Rank],
        DENSE_RANK() OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS DenseRank
    FROM dbo.ContestResults

Partial Results:

    Category   Weight      Entrant             Rank DenseRank
    ---------- ----------- ------------------- ---- ---------
    Pumpkin    716         Chad Johnson        1    1
    Pumpkin    679         George Kopsell      2    2
    Pumpkin    679         Dan Gardner         2    2
    Pumpkin    481         John Suydam         4    3
    ...
    Squash     462         Dan Gardner         1    1
    Squash     462         Harvey Zale         1    1
    Squash     435         Terry Helmer        3    2
    Squash     405         Gary Spiel          4    3
    Watermelon 146         Mark Bardin         1    1
    Watermelon 139         Christine Daaé      2    2
    Watermelon 139         Carlotta Giudicelli 2    2
    Watermelon 132         Ubaldo Piangi       4    3
    Watermelon 132         Meg Giry            4    3
    Watermelon 129         Joseph Buquet       6    4

Now we have useful results. If the requirement is that we give out exactly three prizes, then we can pull from the first three entries in each category using Rank. If the requirement is that we give out prizes for the top three weights, and we can give more than one prize for any ties, then we can pull from the entries in each category that have a DenseRank of 3 or less.

Using PARTITION BY and ORDER BY with a Common Table Expression

Since we need just the top entries in each category, it would be nice to do something like this (using DENSE_RANK):

    SELECT
        Category,
        Entrant,
        DENSE_RANK() OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS DenseRank
    FROM dbo.ContestResults
    WHERE DenseRank <= 3

Well, that doesn’t work (“Invalid column name 'DenseRank'.”).

I have seen people use table variables to store the results and then SELECT from the results. The other option is to use a common table expression. Here are the two versions of the query that do work:

Table Variable:

    DECLARE @t TABLE (
        Category    varchar(10),
        Entrant     nvarchar(20),
        DenseRank   bigint
    )
    INSERT INTO @t (
        Category,
        Entrant,
        DenseRank
    )
    SELECT
        Category,
        Entrant,
        DENSE_RANK() OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS DenseRank
    FROM dbo.ContestResults
    SELECT *
    FROM @t
    WHERE DenseRank <= 3
    ORDER BY Category, DenseRank, Entrant

Common Table Expression:

    WITH t
    AS (
        SELECT
            Category,
            Entrant,
            DENSE_RANK() OVER (
                PARTITION BY Category
                ORDER BY Weight DESC
            ) AS DenseRank
        FROM dbo.ContestResults
    )
    SELECT
        Category,
        Entrant,
        DenseRank
    FROM t
    WHERE DenseRank <= 3
    ORDER BY Category, DenseRank, Entrant

Results of Either Query:

    Category   Entrant             DenseRank
    ---------- ------------------- ---------
    Pumpkin    Chad Johnson        1
    Pumpkin    Dan Gardner         2
    Pumpkin    George Kopsell      2
    Pumpkin    John Suydam         3
    Squash     Dan Gardner         1
    Squash     Harvey Zale         1
    Squash     Terry Helmer        2
    Squash     Gary Spiel          3
    Watermelon Mark Bardin         1
    Watermelon Carlotta Giudicelli 2
    Watermelon Christine Daaé      2
    Watermelon Meg Giry            3
    Watermelon Ubaldo Piangi       3

I think the CTE code is much simpler to read. It is certainly more compact, though that does not always mean it’s easier to read or maintain.

Execution Plan Differences (Table Variable versus CTE)

What about performance? Yes, I know this is a trivial example, but the differences are interesting anyway.

The execution plans from these two queries show that the table variable version is 60% of the batch, where the CTE version is 40% of the batch. See the graphic plan below:

Using RANK() in the ORDER BY Clause

What if you just want to pull the top three entries in a single category? Most examples use RANK()in the SELECT clause, but you can also use RANK() [and DENSE_RANK()] in the ORDER BY clause.

Query:

    SELECT TOP 3
        Entrant,
        Weight
    FROM dbo.ContestResults
    WHERE Category = 'Watermelon'
    ORDER BY RANK() OVER (
        PARTITION BY Category
        ORDER BY Weight DESC
    )

Result:

    Entrant             Weight
    ------------------- -----------
    Mark Bardin         146
    Carlotta Giudicelli 139
    Christine Daaé      139

An important point here is that you would only want to use ranking functions in the ORDER BY clause of a query if you are using PARTITION BY. If you left out the PARTITION BY, you would be left with a needlessly complicated ORDER BY clause. These two queries illustrate this. They produce the exact same result.

    SELECT ...
    FROM dbo.ContestResults
    ORDER BY DENSE_RANK() OVER (ORDER BY Weight DESC)
    SELECT ...
    FROM dbo.ContestResults
    ORDER BY Weight DESC

See the queries here and the graphic plan below.

Using RANK() in the SELECT and ORDER BY Clauses

It is interesting to note that, based on the query plan, including the Rank in the SELECT list does not incur a penalty. The work is done once and used twice.

Query:

    SELECT TOP 3
        Entrant,
        Weight,
        RANK() OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS [Rank]
    FROM dbo.ContestResults
    WHERE Category = 'Watermelon'
    ORDER BY RANK() OVER (
        PARTITION BY Category
        ORDER BY Weight DESC
    )

Result:

    Entrant             Weight Rank
    ------------------- ------ ----
    Mark Bardin         146    1
    Carlotta Giudicelli 139    2
    Christine Daaé      139    2

See the queries here and the graphic plan below.

Conclusion

My plan was to cover all of the ranking functions in this article, but once I got started, there was just too much interesting stuff to cover. Next month, I will cover NTILE() and ROW_NUMBER().

A Challenge

It is possible to use RANK() with different PARTITION or ORDER BY in the SELECT clause and ORDER BY clauses, but I was unable to come up with something that actually produced worthwhile results. Using the sample table and data presented here, can you come up with a worthwhile SELECT statement that uses RANK() or DENSE_RANK() in both the SELECT and ORDER BY clauses where they don’t match? If you can, submit your query and an explanation through the forum.

» See All Articles by Columnist Rob Garrison



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM