Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS SQL
Oracle
DB2
MS Access
MySQL
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» Sitemap
Free Newsletters:
DatabaseDaily  
News Via RSS Feed


follow us on Twitter
Database Journal |DBA Support |SQLCourse |SQLCourse2
 

Featured Database Articles

MS SQL

Posted Jan 18, 2006

Row_Number() function in SQL Server 2005 - Part II

By Muthusamy Anantha Kumar aka The MAK

In part I of this article, we saw how to generate row numbers and delete a simple single duplicate row. This article walks you through how to delete multiple duplicate rows. This article uses the new feature Row_Number() function, common table expression and the ANSI co-related subquery.

Let us assume we have the following table, Mytable, in the database MyDB. Let us create the database MyDB and MyTable by using the following script.

USE [MASTER]
GO
IF  EXISTS 
  (SELECT NAME FROM SYS.DATABASES 
   WHERE NAME = N'MYDB')
DROP DATABASE [MYDB]
GO
CREATE DATABASE MYDB 
GO
USE [MYDB]
GO
IF  EXISTS 
  (SELECT * FROM SYS.OBJECTS 
  WHERE OBJECT_ID = 
    OBJECT_ID(N'[DBO].[MYTABLE]') 
  AND TYPE IN (N'U'))
DROP TABLE [DBO].[MYTABLE]
GO
CREATE TABLE MYTABLE 
  (ID INT, FNAME VARCHAR(50),
  LNAME VARCHAR(50))
GO
insert into mytable select 1,'Jen','Ambelang'
insert into mytable select 11,'Jiong','Hong'
insert into mytable select 25,'Sandra','Mator'
insert into mytable select 35,'Chun','Chang'
insert into mytable select 21,'Yuki','Fukushima'
insert into mytable select 1,'Jen','Ambelang'
insert into mytable select 1,'Jen','Ambelang'
insert into mytable select 25,'Sandra','Mator'
insert into mytable select 25,'Sandra','Mator'
go

Let us query all the rows from the table MyTable using the following Select query.

SELECT * FROM MYTABLE ORDER BY ID

This query displays the following results.

1

Jen

Ambelang

1

Jen

Ambelang

1

Jen

Ambelang

11

Jiong

Hong

21

Yuki

Fukushima

25

Sandra

Mator

25

Sandra

Mator

25

Sandra

Mator

35

Chun

Chang

From the results, it is very clear that we have duplicate rows: 1, Jen, Ambelang and 25, Sandra, Mator.

Step 1

The first step in deleting duplicate rows is to generate a unique row id for the entire table. This can be done using the Row_Number() function.

Let us generate the unique numbers by executing the following query:

SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ROW,
ID,LNAME,FNAME FROM MYTABLE

This query produces the following results.

1

1

Ambelang

Jen

2

1

Ambelang

Jen

3

1

Ambelang

Jen

4

11

Hong

Jiong

5

21

Fukushima

Yuki

6

25

Mator

Sandra

7

25

Mator

Sandra

8

25

Mator

Sandra

9

35

Chang

Chun

Step 2

The second step in deleting duplicate rows is to generate unique row ids for every group.

Now, by using a co-related sub-query we can produce unique row ids for each group.

SELECT ROW,GROUPROW= CASE WHEN ID=ID
THEN 
(SELECT COUNT(*) FROM (SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ROW,
ID,LNAME,FNAME FROM MYTABLE
) AS A WHERE A.ID=B.ID AND
A.ROW<B.ROW)+1 END,ID,FNAME,LNAME FROM (SELECT ROW_NUMBER() OVER (ORDER 
BY ID) AS ROW,
ID,LNAME,FNAME FROM MYTABLE
)AS B

This query would produce the following results with a unique id for each group.

1

1

1

Jen

Ambelang

2

2

1

Jen

Ambelang

3

3

1

Jen

Ambelang

4

1

11

Jiong

Hong

5

1

21

Yuki

Fukushima

6

1

25

Sandra

Mator

7

2

25

Sandra

Mator

8

3

25

Sandra

Mator

9

1

35

Chun

Chang

Step 3

The last step in deleting duplicate rows is to use the common table expression, as shown below.

WITH DUPLICATE(ROW,GROUPROW,ID,FNAME,LNAME)
AS
(
SELECT ROW,GROUPROW= CASE WHEN ID=ID
THEN 
(SELECT COUNT(*) FROM (SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ROW,
ID,LNAME,FNAME FROM MYTABLE
) AS A WHERE A.ID=B.ID AND
A.ROW<B.ROW)+1 END,ID,FNAME,LNAME FROM (SELECT ROW_NUMBER() OVER (ORDER 
BY ID) AS ROW,
ID,LNAME,FNAME FROM MYTABLE
)AS B
)
DELETE FROM DUPLICATE WHERE GROUPROW<>1

Let us query all the rows from the source table, Mytable, by using the following Select Query.

SELECT * FROM MYTABLE ORDER BY ID

This query displays the following results:

1

Jen

Ambelang

11

Jiong

Hong

21

Yuki

Fukushima

25

Sandra

Mator

35

Chun

Chang

From the results, it is clear that duplicates from the table, Mytable, have been removed.

Conclusion

The main intent of this article was to demonstrate the use of SQL Server 2005’s new feature Row_number() function and Common Table Expression, with the help of co-related sub-query, to delete duplicate rows.

» See All Articles by Columnist MAK



MS SQL Archives

Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 




Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM