Dealing with MS SQL Tables that contain Duplicate Rows

Every so often, you might have
to deal with tables that contain duplicate rows. In one case, you might only
need to identify the duplicate rows. In other cases, you might need to remove
the duplicate rows. This article will show you some different techniques for
dealing with duplicate rows.

Your interpretation of a duplicate
row might mean something different then the meaning to someone else. A
duplicate row could mean that the data that makes up the key for each row is
duplicated on more than one row. Another definition might be every column in a
row matches with another row in the same table. I will show you how to find
the duplicate rows in both of these cases.

Identifying Duplicate Key

One of the first things you
will need consider when dealing with duplicates is how to identify the rows
that contain duplicate values. To identify duplicates, you need to determine
what is considered a duplicate record.

For the purpose of my first example,
I will be using the sales table in the pubs database. The sales table has the
following columns: stor_id, ord_num, ord_date, qty, payterms, title_id. For
this example, let’s say a duplicate record is any record that has the same stor_id
and ord_num. To identify duplicates sales records I would use the following

select stor_id, ord_num , count(*) as ‘Num of Dups’
from pubs.dbo.sales
group by stor_id, ord_num
having count(*) > 1

When this code is run I get
the following results:

stor_id ord_num Num of Dups
——- ——————– ———–
7067 P2121 3
7131 P3087a 4

In this example, I used the
HAVING clause in conjunction with the GROUP BY to identify the number of
duplicate records. In this case, I only showed the key values (stor_id, and ord_num)
associated with the duplicate records along with the number of duplicates.
Now in reality these are not duplicate records if you looked at all the columns
in the table, but since I said we were only looking at stor_id and ord_num,
these are considered duplicates.

Displaying Rows for
Records that have Duplicate Key Values

Now my previous example only
displayed the duplicate key values. I assume some of the time you may want to
display all the duplicate records associated with duplicate key values. Here
is some code that will display all of the rows for each duplicate key value.

select cast(a.stor_id as char(4)) as stor_id,
cast(a.ord_num as char(6)) as ord_num,
convert(char(8),a.ord_date,110) as ord_date,
from pubs.dbo.sales a
(select stor_id, ord_num
from pubs.dbo.sales
group by stor_id, ord_num
having count(*) > 1) b
on a.stor_id = b.stor_id
a.ord_num = b.ord_num
order by a.stor_id, a.ord_num

For this example, I took my
previous TSQL command that returned the duplicate keys (stor_id and ord_num)
and joined it with the pubs.dbo.sales table. The join criteria allow TSQL to
only return rows in the pubs.dbo.sales table that match the duplicate key
values. The following output is displayed in the Query Analyzer results pane
when I run this query:

stor_id ord_num ord_date qty payterms title_id
——- ——- ——– —— ———— ——–
7067 P2121 06-15-19 40 Net 30 TC3218
7067 P2121 06-15-19 20 Net 30 TC4203
7067 P2121 06-15-19 20 Net 30 TC7777
7131 P3087a 05-29-19 20 Net 60 PS1372
7131 P3087a 05-29-19 25 Net 60 PS2106
7131 P3087a 05-29-19 15 Net 60 PS3333
7131 P3087a 05-29-19 25 Net 60 PS7777

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Latest Articles