dcsimg

SqlCredit - Part 19: Exploring SQL 2005's Ranking Functions - RANK() and DENSE_RANK() - Page 2

September 26, 2008

Using PARTITION BY and ORDER BY with a Common Table Expression

Since we need just the top entries in each category, it would be nice to do something like this (using DENSE_RANK):

    SELECT
        Category,
        Entrant,
        DENSE_RANK() OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS DenseRank
    FROM dbo.ContestResults
    WHERE DenseRank <= 3

Well, that doesn’t work (“Invalid column name 'DenseRank'.”).

I have seen people use table variables to store the results and then SELECT from the results. The other option is to use a common table expression. Here are the two versions of the query that do work:

Table Variable:

    DECLARE @t TABLE (
        Category    varchar(10),
        Entrant     nvarchar(20),
        DenseRank   bigint
    )
    INSERT INTO @t (
        Category,
        Entrant,
        DenseRank
    )
    SELECT
        Category,
        Entrant,
        DENSE_RANK() OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS DenseRank
    FROM dbo.ContestResults
    SELECT *
    FROM @t
    WHERE DenseRank <= 3
    ORDER BY Category, DenseRank, Entrant

Common Table Expression:

    WITH t
    AS (
        SELECT
            Category,
            Entrant,
            DENSE_RANK() OVER (
                PARTITION BY Category
                ORDER BY Weight DESC
            ) AS DenseRank
        FROM dbo.ContestResults
    )
    SELECT
        Category,
        Entrant,
        DenseRank
    FROM t
    WHERE DenseRank <= 3
    ORDER BY Category, DenseRank, Entrant

Results of Either Query:

    Category   Entrant             DenseRank
    ---------- ------------------- ---------
    Pumpkin    Chad Johnson        1
    Pumpkin    Dan Gardner         2
    Pumpkin    George Kopsell      2
    Pumpkin    John Suydam         3
    Squash     Dan Gardner         1
    Squash     Harvey Zale         1
    Squash     Terry Helmer        2
    Squash     Gary Spiel          3
    Watermelon Mark Bardin         1
    Watermelon Carlotta Giudicelli 2
    Watermelon Christine Daaé      2
    Watermelon Meg Giry            3
    Watermelon Ubaldo Piangi       3

I think the CTE code is much simpler to read. It is certainly more compact, though that does not always mean it’s easier to read or maintain.

Execution Plan Differences (Table Variable versus CTE)

What about performance? Yes, I know this is a trivial example, but the differences are interesting anyway.

The execution plans from these two queries show that the table variable version is 60% of the batch, where the CTE version is 40% of the batch. See the graphic plan below:

Click for larger image
The execution plans from these two queries

Using RANK() in the ORDER BY Clause

What if you just want to pull the top three entries in a single category? Most examples use RANK()in the SELECT clause, but you can also use RANK() [and DENSE_RANK()] in the ORDER BY clause.

Query:

    SELECT TOP 3
        Entrant,
        Weight
    FROM dbo.ContestResults
    WHERE Category = 'Watermelon'
    ORDER BY RANK() OVER (
        PARTITION BY Category
        ORDER BY Weight DESC
    )

Result:

    Entrant             Weight
    ------------------- -----------
    Mark Bardin         146
    Carlotta Giudicelli 139
    Christine Daaé      139

An important point here is that you would only want to use ranking functions in the ORDER BY clause of a query if you are using PARTITION BY. If you left out the PARTITION BY, you would be left with a needlessly complicated ORDER BY clause. These two queries illustrate this. They produce the exact same result.

    SELECT ...
    FROM dbo.ContestResults
    ORDER BY DENSE_RANK() OVER (ORDER BY Weight DESC)
    SELECT ...
    FROM dbo.ContestResults
    ORDER BY Weight DESC

See the queries here and the graphic plan below.

Click for larger image
DenseRankWithoutPartition graphic plan

Using RANK() in the SELECT and ORDER BY Clauses

It is interesting to note that, based on the query plan, including the Rank in the SELECT list does not incur a penalty. The work is done once and used twice.

Query:

    SELECT TOP 3
        Entrant,
        Weight,
        RANK() OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS [Rank]
    FROM dbo.ContestResults
    WHERE Category = 'Watermelon'
    ORDER BY RANK() OVER (
        PARTITION BY Category
        ORDER BY Weight DESC
    )

Result:

    Entrant             Weight Rank
    ------------------- ------ ----
    Mark Bardin         146    1
    Carlotta Giudicelli 139    2
    Christine Daaé      139    2

See the queries here and the graphic plan below.

Click for larger image
RankInSelectAndOrderBy graphic plan

Conclusion

My plan was to cover all of the ranking functions in this article, but once I got started, there was just too much interesting stuff to cover. Next month, I will cover NTILE() and ROW_NUMBER().

A Challenge

It is possible to use RANK() with different PARTITION or ORDER BY in the SELECT clause and ORDER BY clauses, but I was unable to come up with something that actually produced worthwhile results. Using the sample table and data presented here, can you come up with a worthwhile SELECT statement that uses RANK() or DENSE_RANK() in both the SELECT and ORDER BY clauses where they don’t match? If you can, submit your query and an explanation through the forum.

» See All Articles by Columnist Rob Garrison

SqlCredit - Developing a Complete SQL Server OLTP Database Project








The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers