Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Oct 24, 2008

Exploring SQL 2005's Ranking Functions - NTILE() and ROW_NUMBER()

By Rob Garrison

This is part 20 of a series. If you have not read part 19, I suggest starting here. That article is the first of this two-part coverage of ranking functions.

Table and Data for Illustration

This article will use the same table and data as Part 19, so I won’t cover that again. The code to build and populate the table is here.

NTILE()

NTILE() breaks the results into groups. In the simplest case, if we pass the integer 2 to NTILE() using Weight, it will rank each entry as in either the first or second half based on Weight.

NTILE() with ORDER BY Only

Query:

    SELECT
        Category,
        Weight,
        Entrant,
        NTILE(2) OVER (
            ORDER BY Weight DESC
        ) AS [Ntile]
    FROM dbo.ContestResults

Partial Results:

    Category   Weight Entrant         Ntile
    ---------- ------ --------------- -----
    Pumpkin    716    Chad Johnson    1
    Pumpkin    679    George Kopsell  1
    ...
    Watermelon 132    Meg Giry        2
    Watermelon 129    Joseph Buquet   2

This is valid, but it’s not terribly interesting or valuable. Let’s add a PARTITION BY clause so we get top- and bottom-half entries for each category.

NTILE() with PARTITION BY and ORDER BY

Query:

    SELECT
        Category,
        Weight,
        Entrant,
        NTILE(2) OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS [Ntile]
    FROM dbo.ContestResults

Partial Results:

    Category   Weight Entrant         Ntile
    ---------- ------ --------------- --------------------
    Pumpkin    716    Chad Johnson    1
    ...
    Pumpkin    229    Harvey Zale     2
    Squash     462    Dan Gardner     1
    ...
    Squash     405    Gary Spiel      2
    Watermelon 146    Mark Bardin     1
    ...
    Watermelon 129    Joseph Buquet   2

Now we have something useful.

Passing Different int Values to NTILE()

Passing 1 to NTILE() is worthless; it gives us 1 for each entry. But what if we pass a larger number to NTILE() than the number of result set rows? In our test table, we have four entries in the squash category. Let’s try passing different values to NTILE(), starting with 2.

Query:

    SELECT
        ...
        NTILE(2) OVER (
            ORDER BY Weight DESC
        ) AS [Ntile]
    FROM dbo.ContestResults
    WHERE Category = 'Squash'

Results:

    Category Weight Entrant      Ntile
    -------- ------ ------------ -----
    Squash   462    Dan Gardner  1
    Squash   462    Harvey Zale  1
    Squash   435    Terry Helmer 2
    Squash   405    Gary Spiel   2

Sure, this splits the entries into two halves. Now what about 3?

    Category Weight Entrant      Ntile
    -------- ------ ------------ -----
    Squash   462    Dan Gardner  1
    Squash   462    Harvey Zale  1
    Squash   435    Terry Helmer 2
    Squash   405    Gary Spiel   3

Notice that the first two entries get tile 1 and have the same weight. This is interesting, but it’s not consistent. I will explain the uneven distribution shortly.

If we switch to 4, we get one tile per entry as expected. Any integer above 4 will return the same results for this set as 4.

    Category Weight Entrant      Ntile
    -------- ------ ------------ -----
    Squash   462    Dan Gardner  1
    Squash   462    Harvey Zale  2
    Squash   435    Terry Helmer 3
    Squash   405    Gary Spiel   4

The tiling will split records that have the same weight into different tiles. Here, we are using 7 as the tile integer and pulling the thirteen pumpkin entries.

    Category Weight Entrant        Ntile
    -------- ------ -------------- -----
    Pumpkin  716    Chad Johnson   1
    Pumpkin  679    George Kopsell 1
    Pumpkin  679    Dan Gardner    2
    Pumpkin  481    John Suydam    2
    ....
    Pumpkin  229    Harvey Zale    7

Note that two entries have the exact same weight (679) but are split between tile 1 and 2. I tried rebuilding the table and putting Dan Gardner’s entry in first and, as you might expect, the tiles were swapped. So, for entries with equal weight, the order of the base records determines the tiling.

Another pattern is that if you have n records in your result set and n-1 as your ntile integer, the first two entries always have a tile of 1.

Dealing Cards

The distribution is actually very simple; it’s just like dealing cards. Think of the tiles like card players. If you dealt five cards to four players, the first player would have an extra card. If you dealt one more card, it would go to the second player. If you think of the distribution of tiles like that, it will match the results.

Using a Variable for the NTILE() int Value

You can use a variable for the NTILE int value. This has clear value.

Query:

    DECLARE @i  int
    SET @i = 5
 
    SELECT
        ...
        NTILE(@i) OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS [Ntile]
    FROM dbo.ContestResults

ROW_NUMBER()

ROW_NUMBER() numbers the rows of the result set. Here is the simplest case.

ROW_NUMBER() with ORDER BY Only

Query:

    SELECT
        ...
        ROW_NUMBER() OVER (
            ORDER BY Weight DESC
        ) AS [RowNumber]
    FROM dbo.ContestResults

Partial Results:

    Category   Weight Entrant       RowNumber
    ---------- ------ ------------- ---------
    Pumpkin    716    Chad Johnson  1
    Pumpkin    679    Dan Gardner   2
    ...
    Watermelon 132    Meg Giry      22
    Watermelon 129    Joseph Buquet 23

Valid, yes, but fairly worthless. Adding a PARTITION BY clause starts to show the value.

ROW_NUMBER() with PARTITION BY and ORDER BY

Query:

    SELECT
        ...
        ROW_NUMBER() OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS [RowNumber]
    FROM dbo.ContestResults

Partial Results:

    Category   Weight Entrant        RowNumber
    ---------- ------ -------------- --------------------
    Pumpkin    716    Chad Johnson   1
    Pumpkin    679    Dan Gardner    2
    ...
    Pumpkin    247    Harvey Zale    12
    Pumpkin    229    Harvey Zale    13
    Squash     462    Dan Gardner    1
    Squash     462    Harvey Zale    2
    Squash     435    Terry Helmer   3
    Squash     405    Gary Spiel     4
    Watermelon 146    Mark Bardin    1
    Watermelon 139    Christine Daaé 2
    ...
    Watermelon 132    Meg Giry       5
    Watermelon 129    Joseph Buquet  6

Now we have row numbers starting at 1 for each category. This will allow us, for instance, to pull the top three entries from each category. This is really the best choice to fulfill our original task of picking who receives prizes.

Using ROW_NUMBER() with PARTITION BY and ORDER BY with a Common Table Expression

Just like RANK() and DENSE_RANK(), you can’t do this:

    SELECT
        ...
        ROW_NUMBER() OVER (
            PARTITION BY Category
            ORDER BY Weight DESC
        ) AS RowNumber
    FROM dbo.ContestResults
    WHERE RowNumber <= 3


You get “Invalid column name 'RowNumber'.” So, we turn again to Common Table Expressions (CTEs) or table variables. I think the CTEs are cleaner, so I will only illustrate that and not table variables.

Common Table Expression:

    WITH t
    AS (
        SELECT
            ...
            ROW_NUMBER() OVER (
                PARTITION BY Category
                ORDER BY Weight DESC
            ) AS RowNumber
        FROM dbo.ContestResults
    )
    SELECT
        ...
        RowNumber
    FROM t
    WHERE RowNumber <= 3
    ORDER BY Category, RowNumber

Results:

    Category   Weight Entrant             RowNumber
    ---------- ------ ------------------- ---------
    Pumpkin    716    Chad Johnson        1
    Pumpkin    679    Dan Gardner         2
    Pumpkin    679    George Kopsell      3
    Squash     462    Dan Gardner         1
    Squash     462    Harvey Zale         2
    Squash     435    Terry Helmer        3
    Watermelon 146    Mark Bardin         1
    Watermelon 139    Christine Daaé      2
    Watermelon 139    Carlotta Giudicelli 3

A Real-World ROW_NUMBER() Example: Most Recent Entry by Person

I have had to use ROW_NUMBER() multiple times recently for a specific requirement. We have a user action that happens multiple times, and we have to pull only the most recent entry for each person. ROW_NUMBER() is a huge winner here.

Say we’re storing data for a pediatrician and need to pull information for the most recent visits by patients. This is a simple example, but you can see the power of using ROW_NUMBER() here.

Query:

    DECLARE @records TABLE (
        Person      int,
        VisitDate   datetime,
        Purpose     varchar(10)
    )
 
    INSERT INTO @records (Person, VisitDate, Purpose) VALUES ...
        (100, '1992-12-31', 'Birth')
        (100, '1993-01-12', 'CheckUp')
        (100, '1993-03-22', 'Shots')
        (100, '1993-04-19', 'EarInf')
 
        (101, '2003-01-29', 'Birth')
        (101, '2003-02-03', 'CheckUp')
 
        (102, '2008-10-13', 'Birth')
 
    SELECT * FROM @records
 
    ;WITH t
    AS (
        SELECT
            Person,
            VisitDate,
            Purpose,
            ROW_NUMBER() OVER (
                PARTITION BY Person
                ORDER BY VisitDate DESC
            ) AS RowNumber
        FROM @records
    )
    SELECT
        Person,
        VisitDate,
        Purpose
    FROM t
    WHERE RowNumber = 1
    ORDER BY Person

Result:

    Person VisitDate Purpose
    ------ ---------- -------
    100    1992-12-31 Birth
    100    1993-01-12 CheckUp
    100    1993-03-22 Shots
    100    1993-04-19 EarInf
    101    2003-01-29 Birth
    101    2003-02-03 CheckUp
    102    2008-10-13 Birth
 
    Person VisitDate Purpose
    ------ ---------- -------
    100    1993-04-19 EarInf
    101    2003-02-03 CheckUp
    102    2008-10-13 Birth

This gives us exactly what we want with very readable code.

Conclusion

NTILE() may have a “sweet-spot” in some applications, but I don’t see it being as useful as ROW_NUMBER(). I think ROW_NUMBER() is very useful.

If you are a big fan of NTILE() and can explain where it is useful, please drop a comment in the forum. You can also add comments there about ROW_NUMBER() or any of the ranking functions.

So far, no one has answered the challenge I posed in Part 19. Check it out and see if you can be the one that answers it.

» See All Articles by Columnist Rob Garrison

SqlCredit - Developing a Complete SQL Server OLTP Database Project



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM


















Thanks for your registration, follow us on our social networks to keep up-to-date