dcsimg

Dealing with MS SQL Tables that contain Duplicate Rows

July 16, 2003

Every so often, you might have to deal with tables that contain duplicate rows. In one case, you might only need to identify the duplicate rows. In other cases, you might need to remove the duplicate rows. This article will show you some different techniques for dealing with duplicate rows.

Your interpretation of a duplicate row might mean something different then the meaning to someone else. A duplicate row could mean that the data that makes up the key for each row is duplicated on more than one row. Another definition might be every column in a row matches with another row in the same table. I will show you how to find the duplicate rows in both of these cases.

Identifying Duplicate Key Values

One of the first things you will need consider when dealing with duplicates is how to identify the rows that contain duplicate values. To identify duplicates, you need to determine what is considered a duplicate record.

For the purpose of my first example, I will be using the sales table in the pubs database. The sales table has the following columns: stor_id, ord_num, ord_date, qty, payterms, title_id. For this example, let's say a duplicate record is any record that has the same stor_id and ord_num. To identify duplicates sales records I would use the following code:

select stor_id, ord_num , count(*) as 'Num of Dups' 
from pubs.dbo.sales
group by stor_id, ord_num 
having count(*) > 1

When this code is run I get the following results:

stor_id ord_num              Num of Dups 
------- -------------------- ----------- 
7067    P2121                3
7131    P3087a               4

In this example, I used the HAVING clause in conjunction with the GROUP BY to identify the number of duplicate records. In this case, I only showed the key values (stor_id, and ord_num) associated with the duplicate records along with the number of duplicates. Now in reality these are not duplicate records if you looked at all the columns in the table, but since I said we were only looking at stor_id and ord_num, these are considered duplicates.

Displaying Rows for Records that have Duplicate Key Values

Now my previous example only displayed the duplicate key values. I assume some of the time you may want to display all the duplicate records associated with duplicate key values. Here is some code that will display all of the rows for each duplicate key value.

select cast(a.stor_id as char(4)) as stor_id, 
       cast(a.ord_num as char(6)) as ord_num, 
       convert(char(8),a.ord_date,110) as ord_date,
       a.qty, 
       a.payterms, 
       a.title_id 
from pubs.dbo.sales a
join 
(select stor_id, ord_num  
from pubs.dbo.sales
group by stor_id, ord_num 
having count(*) > 1) b  
on a.stor_id = b.stor_id
           and
   a.ord_num = b.ord_num
order by a.stor_id, a.ord_num

For this example, I took my previous TSQL command that returned the duplicate keys (stor_id and ord_num) and joined it with the pubs.dbo.sales table. The join criteria allow TSQL to only return rows in the pubs.dbo.sales table that match the duplicate key values. The following output is displayed in the Query Analyzer results pane when I run this query:

stor_id ord_num ord_date qty    payterms     title_id 
------- ------- -------- ------ ------------ -------- 
7067    P2121   06-15-19 40     Net 30       TC3218
7067    P2121   06-15-19 20     Net 30       TC4203
7067    P2121   06-15-19 20     Net 30       TC7777
7131    P3087a  05-29-19 20     Net 60       PS1372
7131    P3087a  05-29-19 25     Net 60       PS2106
7131    P3087a  05-29-19 15     Net 60       PS3333
7131    P3087a  05-29-19 25     Net 60       PS7777







The Network for Technology Professionals

Search:

About Internet.com

Legal Notices, Licensing, Permissions, Privacy Policy.
Advertise | Newsletters | E-mail Offers