SHARE

Dealing with MS SQL Tables that contain Duplicate Rows

Written By

Jul 17, 2003

2 minute read

Every so often, you might have
to deal with tables that contain duplicate rows. In one case, you might only
need to identify the duplicate rows. In other cases, you might need to remove
the duplicate rows. This article will show you some different techniques for
dealing with duplicate rows.

Your interpretation of a duplicate
row might mean something different then the meaning to someone else. A
duplicate row could mean that the data that makes up the key for each row is
duplicated on more than one row. Another definition might be every column in a
row matches with another row in the same table. I will show you how to find
the duplicate rows in both of these cases.

Identifying Duplicate Key
Values

One of the first things you
will need consider when dealing with duplicates is how to identify the rows
that contain duplicate values. To identify duplicates, you need to determine
what is considered a duplicate record.

For the purpose of my first example,
I will be using the sales table in the pubs database. The sales table has the
following columns: stor_id, ord_num, ord_date, qty, payterms, title_id. For
this example, let’s say a duplicate record is any record that has the same stor_id
and ord_num. To identify duplicates sales records I would use the following
code:

select stor_id, ord_num , count(*) as ‘Num of Dups’
from pubs.dbo.sales
group by stor_id, ord_num
having count(*) > 1

When this code is run I get
the following results:

stor_id ord_num              Num of Dups
——- ——————– ———–
7067    P2121                3
7131    P3087a               4

In this example, I used the
HAVING clause in conjunction with the GROUP BY to identify the number of
duplicate records. In this case, I only showed the key values (stor_id, and ord_num)
associated with the duplicate records along with the number of duplicates.
Now in reality these are not duplicate records if you looked at all the columns
in the table, but since I said we were only looking at stor_id and ord_num,
these are considered duplicates.

Displaying Rows for
Records that have Duplicate Key Values

Now my previous example only
displayed the duplicate key values. I assume some of the time you may want to
display all the duplicate records associated with duplicate key values. Here
is some code that will display all of the rows for each duplicate key value.

select cast(a.stor_id as char(4)) as stor_id,
       cast(a.ord_num as char(6)) as ord_num,
       convert(char(8),a.ord_date,110) as ord_date,
       a.qty,
       a.payterms,
       a.title_id
from pubs.dbo.sales a
join
(select stor_id, ord_num
from pubs.dbo.sales
group by stor_id, ord_num
having count(*) > 1) b
on a.stor_id = b.stor_id
           and
   a.ord_num = b.ord_num
order by a.stor_id, a.ord_num

For this example, I took my
previous TSQL command that returned the duplicate keys (stor_id and ord_num)
and joined it with the pubs.dbo.sales table. The join criteria allow TSQL to
only return rows in the pubs.dbo.sales table that match the duplicate key
values. The following output is displayed in the Query Analyzer results pane
when I run this query:

stor_id ord_num ord_date qty    payterms     title_id
——- ——- ——– —— ———— ——–
7067    P2121   06-15-19 40     Net 30       TC3218
7067    P2121   06-15-19 20     Net 30       TC4203
7067    P2121   06-15-19 20     Net 30       TC7777
7131    P3087a  05-29-19 20     Net 60       PS1372
7131    P3087a  05-29-19 25     Net 60       PS2106
7131    P3087a  05-29-19 15     Net 60       PS3333
7131    P3087a  05-29-19 25     Net 60       PS7777

Gregory Larsen

Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Dealing with MS SQL Tables that contain Duplicate Rows

Identifying Duplicate Key
Values

Displaying Rows for
Records that have Duplicate Key Values

Gregory Larsen

Company

Categories

Dealing with MS SQL Tables that contain Duplicate Rows

Identifying Duplicate Key Values

Displaying Rows for Records that have Duplicate Key Values

Gregory Larsen

Company

Categories

Identifying Duplicate Key
Values

Displaying Rows for
Records that have Duplicate Key Values