This is part 20 of a series. If you have not read part 19, I

suggest starting here.

That article is the first of this two-part coverage of ranking functions.

#### Table and Data for Illustration

This article will use the same table and data as Part 19, so

I won’t cover that again. The code to build and populate the table is here.

### NTILE()

NTILE() breaks the results into groups. In the simplest

case, if we pass the integer 2 to NTILE() using Weight, it will rank each entry

as in either the first or second half based on Weight.

#### NTILE() with ORDER BY Only

**Query:**

SELECT

Category,

Weight,

Entrant,

NTILE(2) OVER (

ORDER BY Weight DESC

) AS [Ntile]

FROM dbo.ContestResults

**Partial Results:**

Category Weight Entrant Ntile

———- —— ————— —–

Pumpkin 716 Chad Johnson 1

Pumpkin 679 George Kopsell 1

…

Watermelon 132 Meg Giry 2

Watermelon 129 Joseph Buquet 2

This is valid, but it’s not terribly interesting or

valuable. Let’s add a PARTITION BY clause so we get top- and bottom-half

entries for each category.

#### NTILE() with PARTITION BY and ORDER BY

**Query:**

SELECT

Category,

Weight,

Entrant,

NTILE(2) OVER (

PARTITION BY Category

ORDER BY Weight DESC

) AS [Ntile]

FROM dbo.ContestResults

**Partial Results:**

Category Weight Entrant Ntile

———- —— ————— ——————–

Pumpkin 716 Chad Johnson 1

…

Pumpkin 229 Harvey Zale 2

Squash 462 Dan Gardner 1

…

Squash 405 Gary Spiel 2

Watermelon 146 Mark Bardin 1

…

Watermelon 129 Joseph Buquet 2

*Now* we have something useful.

#### Passing Different int Values to NTILE()

Passing 1 to NTILE() is worthless; it gives us 1 for each

entry. But what if we pass a larger number to NTILE() than the number of result

set rows? In our test table, we have four entries in the squash category. Let’s

try passing different values to NTILE(), starting with **2**.

**Query:**

SELECT

…

NTILE(2) OVER (

ORDER BY Weight DESC

) AS [Ntile]

FROM dbo.ContestResults

WHERE Category = ‘Squash’

**Results:**

Category Weight Entrant Ntile

——– —— ———— —–

Squash 462 Dan Gardner 1

Squash 462 Harvey Zale 1

Squash 435 Terry Helmer 2

Squash 405 Gary Spiel 2

Sure, this splits the entries into two halves. Now what

about **3**?

Category Weight Entrant Ntile

——– —— ———— —–

Squash 462 Dan Gardner 1

Squash 462 Harvey Zale 1

Squash 435 Terry Helmer 2

Squash 405 Gary Spiel 3

Notice that the first two entries get tile 1 and have the

same weight. This is interesting, but it’s not consistent. I will explain the

uneven distribution shortly.

If we switch to **4**, we get one tile per entry as

expected. *Any* integer above 4 will return the same results for this set

as 4.

Category Weight Entrant Ntile

——– —— ———— —–

Squash 462 Dan Gardner 1

Squash 462 Harvey Zale 2

Squash 435 Terry Helmer 3

Squash 405 Gary Spiel 4

The tiling will split records that have the same weight into

different tiles. Here, we are using **7** as the tile integer and pulling

the thirteen pumpkin entries.

Category Weight Entrant Ntile

——– —— ————– —–

Pumpkin 716 Chad Johnson 1

Pumpkin 679 George Kopsell 1

Pumpkin 679 Dan Gardner 2

Pumpkin 481 John Suydam 2

….

Pumpkin 229 Harvey Zale 7

Note that two entries have the exact same weight (679) but

are split between tile 1 and 2. I tried rebuilding the table and putting Dan

Gardner’s entry in first and, as you might expect, the tiles were swapped. So,

for entries with equal weight, the order of the base records determines the tiling.

Another pattern is that if you have *n* records in your

result set and *n*-1 as your ntile integer, the first two entries always

have a tile of 1.

**Dealing Cards**

The distribution is actually very simple; it’s just like

dealing cards. Think of the tiles like card players. If you dealt five cards to

four players, the first player would have an extra card. If you dealt one more

card, it would go to the second player. If you think of the distribution of

tiles like that, it will match the results.

**Using a Variable for the NTILE() int Value**

You can use a variable for the NTILE int value. This has

clear value.

**Query:**

DECLARE @i int

SET @i = 5SELECT

…

NTILE(@i)OVER (

PARTITION BY Category

ORDER BY Weight DESC

) AS [Ntile]

FROM dbo.ContestResults

### ROW_NUMBER()

ROW_NUMBER() numbers the rows of the result set. Here is the

simplest case.

#### ROW_NUMBER() with ORDER BY Only

**Query:**

SELECT

…

ROW_NUMBER() OVER (

ORDER BY Weight DESC

) AS [RowNumber]

FROM dbo.ContestResults

**Partial Results:**

Category Weight Entrant RowNumber

———- —— ————- ———

Pumpkin 716 Chad Johnson 1

Pumpkin 679 Dan Gardner 2

…

Watermelon 132 Meg Giry 22

Watermelon 129 Joseph Buquet 23

Valid, yes, but fairly worthless. Adding a PARTITION BY

clause starts to show the value.

#### ROW_NUMBER() with PARTITION BY and ORDER BY

**Query:**

SELECT

…

ROW_NUMBER() OVER (

PARTITION BY Category

ORDER BY Weight DESC

) AS [RowNumber]

FROM dbo.ContestResults

**Partial Results:**

Category Weight Entrant RowNumber

———- —— ————– ——————–

Pumpkin 716 Chad Johnson 1

Pumpkin 679 Dan Gardner 2

…

Pumpkin 247 Harvey Zale 12

Pumpkin 229 Harvey Zale 13

Squash 462 Dan Gardner 1

Squash 462 Harvey Zale 2

Squash 435 Terry Helmer 3

Squash 405 Gary Spiel 4

Watermelon 146 Mark Bardin 1

Watermelon 139 Christine Daaé 2

…

Watermelon 132 Meg Giry 5

Watermelon 129 Joseph Buquet 6

Now we have row numbers starting at 1 for each category.

This will allow us, for instance, to pull the top three entries from each

category. This is really the *best* choice to fulfill our original task of

picking who receives prizes.

### Using ROW_NUMBER() with PARTITION BY and ORDER BY with a Common Table

Expression

Just like RANK() and DENSE_RANK(), you can’t do this:

SELECT

…

ROW_NUMBER() OVER (

PARTITION BY Category

ORDER BY Weight DESC

) AS RowNumber

FROM dbo.ContestResults

WHERE RowNumber <= 3

You get “Invalid column name ‘RowNumber’.” So, we turn again to Common Table

Expressions (CTEs) or table variables. I think the CTEs are cleaner, so I will

only illustrate that and not table variables.

**Common Table Expression:**

WITH t

AS (

SELECT

…

ROW_NUMBER() OVER (

PARTITION BY Category

ORDER BY Weight DESC

) AS RowNumber

FROM dbo.ContestResults

)

SELECT

…

RowNumber

FROM t

WHERE RowNumber <= 3

ORDER BY Category, RowNumber

**Results:**

Category Weight Entrant RowNumber

———- —— ——————- ———

Pumpkin 716 Chad Johnson 1

Pumpkin 679 Dan Gardner 2

Pumpkin 679 George Kopsell 3

Squash 462 Dan Gardner 1

Squash 462 Harvey Zale 2

Squash 435 Terry Helmer 3

Watermelon 146 Mark Bardin 1

Watermelon 139 Christine Daaé 2

Watermelon 139 Carlotta Giudicelli 3

### A Real-World ROW_NUMBER() Example: Most Recent Entry by Person

I have had to use ROW_NUMBER() multiple times recently for a

specific requirement. We have a user action that happens multiple times, and we

have to pull only the *most recent* entry for each person. ROW_NUMBER() is

a huge winner here.

Say we’re storing data for a pediatrician and need to pull

information for the most recent visits by patients. This is a simple example,

but you can see the power of using ROW_NUMBER() here.

**Query:**

DECLARE @records TABLE (

Person int,

VisitDate datetime,

Purpose varchar(10)

)INSERT INTO @records (Person, VisitDate, Purpose) VALUES …

(100, ‘1992-12-31’, ‘Birth’)

(100, ‘1993-01-12’, ‘CheckUp’)

(100, ‘1993-03-22’, ‘Shots’)

(100, ‘1993-04-19’, ‘EarInf’)(101, ‘2003-01-29’, ‘Birth’)

(101, ‘2003-02-03’, ‘CheckUp’)(102, ‘2008-10-13’, ‘Birth’)

SELECT * FROM @records

;WITH t

AS (

SELECT

Person,

VisitDate,

Purpose,

ROW_NUMBER() OVER (

PARTITION BY Person

ORDER BY VisitDate DESC

) AS RowNumber

FROM @records

)

SELECT

Person,

VisitDate,

Purpose

FROM t

WHERE RowNumber = 1

ORDER BY Person

**Result:**

Person VisitDate Purpose

—— ———- ——-

100 1992-12-31 Birth

100 1993-01-12 CheckUp

100 1993-03-22 Shots

100 1993-04-19 EarInf

101 2003-01-29 Birth

101 2003-02-03 CheckUp

102 2008-10-13 Birth

Person VisitDate Purpose

—— ———- ——-

100 1993-04-19 EarInf

101 2003-02-03 CheckUp

102 2008-10-13 Birth

This gives us exactly what we want with very readable code.

### Conclusion

NTILE() may have a “sweet-spot” in some applications, but I

don’t see it being as useful as ROW_NUMBER(). I think ROW_NUMBER() is *very*

useful.

If you are a big fan of NTILE() and can explain where it is

useful, please drop a comment in the forum. You can also add comments there

about ROW_NUMBER() or any of the ranking functions.

So far, no one has answered the challenge I

posed in Part 19. Check it out and see if you can be the one that answers it.