In part I
of this article, we saw how to generate row numbers and delete a simple single
duplicate row. This article walks you through how to delete multiple duplicate
rows. This article uses the new feature Row_Number() function, common table
expression and the ANSI co-related subquery.
Let us assume we have the following table, Mytable, in the
database MyDB. Let us create the database MyDB and MyTable by using the
following script.
USE [MASTER]
GO
IF EXISTS
(SELECT NAME FROM SYS.DATABASES
WHERE NAME = N’MYDB’)
DROP DATABASE [MYDB]
GO
CREATE DATABASE MYDB
GO
USE [MYDB]
GO
IF EXISTS
(SELECT * FROM SYS.OBJECTS
WHERE OBJECT_ID =
OBJECT_ID(N’[DBO].[MYTABLE]’)
AND TYPE IN (N’U’))
DROP TABLE [DBO].[MYTABLE]
GO
CREATE TABLE MYTABLE
(ID INT, FNAME VARCHAR(50),
LNAME VARCHAR(50))
GO
insert into mytable select 1,’Jen’,’Ambelang’
insert into mytable select 11,’Jiong’,’Hong’
insert into mytable select 25,’Sandra’,’Mator’
insert into mytable select 35,’Chun’,’Chang’
insert into mytable select 21,’Yuki’,’Fukushima’
insert into mytable select 1,’Jen’,’Ambelang’
insert into mytable select 1,’Jen’,’Ambelang’
insert into mytable select 25,’Sandra’,’Mator’
insert into mytable select 25,’Sandra’,’Mator’
go
Let us query all the rows from the table MyTable using the
following Select query.
SELECT * FROM MYTABLE ORDER BY ID
This query displays the following results.
1 |
Jen |
Ambelang |
1 |
Jen |
Ambelang |
1 |
Jen |
Ambelang |
11 |
Jiong |
Hong |
21 |
Yuki |
Fukushima |
25 |
Sandra |
Mator |
25 |
Sandra |
Mator |
25 |
Sandra |
Mator |
35 |
Chun |
Chang |
From the results, it is very clear that we have duplicate
rows: 1, Jen, Ambelang and 25, Sandra, Mator.
Step 1
The first step in deleting duplicate rows is to generate a
unique row id for the entire table. This can be done using the Row_Number()
function.
Let us generate the unique numbers by executing the
following query:
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ROW,
ID,LNAME,FNAME FROM MYTABLE
This query produces the following results.
1 |
1 |
Ambelang |
Jen |
2 |
1 |
Ambelang |
Jen |
3 |
1 |
Ambelang |
Jen |
4 |
11 |
Hong |
Jiong |
5 |
21 |
Fukushima |
Yuki |
6 |
25 |
Mator |
Sandra |
7 |
25 |
Mator |
Sandra |
8 |
25 |
Mator |
Sandra |
9 |
35 |
Chang |
Chun |
Step 2
The second step in deleting duplicate rows is to generate
unique row ids for every group.
Now, by using a co-related sub-query we can produce unique
row ids for each group.
SELECT ROW,GROUPROW= CASE WHEN ID=ID
THEN
(SELECT COUNT(*) FROM (SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ROW,
ID,LNAME,FNAME FROM MYTABLE
) AS A WHERE A.ID=B.ID AND
A.ROW<B.ROW)+1 END,ID,FNAME,LNAME FROM (SELECT ROW_NUMBER() OVER (ORDER
BY ID) AS ROW,
ID,LNAME,FNAME FROM MYTABLE
)AS B
This query would produce the following results with a unique
id for each group.
1 |
1 |
1 |
Jen |
Ambelang |
2 |
2 |
1 |
Jen |
Ambelang |
3 |
3 |
1 |
Jen |
Ambelang |
4 |
1 |
11 |
Jiong |
Hong |
5 |
1 |
21 |
Yuki |
Fukushima |
6 |
1 |
25 |
Sandra |
Mator |
7 |
2 |
25 |
Sandra |
Mator |
8 |
3 |
25 |
Sandra |
Mator |
9 |
1 |
35 |
Chun |
Chang |
Step 3
The last step in deleting duplicate rows is to use the common
table expression, as shown below.
WITH DUPLICATE(ROW,GROUPROW,ID,FNAME,LNAME)
AS
(
SELECT ROW,GROUPROW= CASE WHEN ID=ID
THEN
(SELECT COUNT(*) FROM (SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ROW,
ID,LNAME,FNAME FROM MYTABLE
) AS A WHERE A.ID=B.ID AND
A.ROW<B.ROW)+1 END,ID,FNAME,LNAME FROM (SELECT ROW_NUMBER() OVER (ORDER
BY ID) AS ROW,
ID,LNAME,FNAME FROM MYTABLE
)AS B
)
DELETE FROM DUPLICATE WHERE GROUPROW<>1
Let us query all the rows from the source table, Mytable,
by using the following Select Query.
SELECT * FROM MYTABLE ORDER BY ID
This query displays the following results:
1 |
Jen |
Ambelang |
11 |
Jiong |
Hong |
21 |
Yuki |
Fukushima |
25 |
Sandra |
Mator |
35 |
Chun |
Chang |
From the results, it is clear that duplicates from the
table, Mytable, have been removed.
Conclusion
The main intent of this article was to demonstrate the use
of SQL Server 2005’s new feature Row_number() function and Common Table
Expression, with the help of co-related sub-query, to delete duplicate rows.