Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum Rss Feed

» Database Journal Home
» Database Articles
» Database Tutorials
MS Access
SQL Scripts & Samples
» Database Forum
» Slideshows
Free Newsletters:

News Via RSS Feed

Rss Feed

Database Journal |DBA Support |SQLCourse |SQLCourse2

Featured Database Articles


Posted Jul 17, 2003

Dealing with MS SQL Tables that contain Duplicate Rows

By Gregory A. Larsen

Every so often, you might have to deal with tables that contain duplicate rows. In one case, you might only need to identify the duplicate rows. In other cases, you might need to remove the duplicate rows. This article will show you some different techniques for dealing with duplicate rows.

Your interpretation of a duplicate row might mean something different then the meaning to someone else. A duplicate row could mean that the data that makes up the key for each row is duplicated on more than one row. Another definition might be every column in a row matches with another row in the same table. I will show you how to find the duplicate rows in both of these cases.

Identifying Duplicate Key Values

One of the first things you will need consider when dealing with duplicates is how to identify the rows that contain duplicate values. To identify duplicates, you need to determine what is considered a duplicate record.

For the purpose of my first example, I will be using the sales table in the pubs database. The sales table has the following columns: stor_id, ord_num, ord_date, qty, payterms, title_id. For this example, let's say a duplicate record is any record that has the same stor_id and ord_num. To identify duplicates sales records I would use the following code:

select stor_id, ord_num , count(*) as 'Num of Dups' 
from pubs.dbo.sales
group by stor_id, ord_num 
having count(*) > 1

When this code is run I get the following results:

stor_id ord_num              Num of Dups 
------- -------------------- ----------- 
7067    P2121                3
7131    P3087a               4

In this example, I used the HAVING clause in conjunction with the GROUP BY to identify the number of duplicate records. In this case, I only showed the key values (stor_id, and ord_num) associated with the duplicate records along with the number of duplicates. Now in reality these are not duplicate records if you looked at all the columns in the table, but since I said we were only looking at stor_id and ord_num, these are considered duplicates.

Displaying Rows for Records that have Duplicate Key Values

Now my previous example only displayed the duplicate key values. I assume some of the time you may want to display all the duplicate records associated with duplicate key values. Here is some code that will display all of the rows for each duplicate key value.

select cast(a.stor_id as char(4)) as stor_id, 
       cast(a.ord_num as char(6)) as ord_num, 
       convert(char(8),a.ord_date,110) as ord_date,
from pubs.dbo.sales a
(select stor_id, ord_num  
from pubs.dbo.sales
group by stor_id, ord_num 
having count(*) > 1) b  
on a.stor_id = b.stor_id
   a.ord_num = b.ord_num
order by a.stor_id, a.ord_num

For this example, I took my previous TSQL command that returned the duplicate keys (stor_id and ord_num) and joined it with the pubs.dbo.sales table. The join criteria allow TSQL to only return rows in the pubs.dbo.sales table that match the duplicate key values. The following output is displayed in the Query Analyzer results pane when I run this query:

stor_id ord_num ord_date qty    payterms     title_id 
------- ------- -------- ------ ------------ -------- 
7067    P2121   06-15-19 40     Net 30       TC3218
7067    P2121   06-15-19 20     Net 30       TC4203
7067    P2121   06-15-19 20     Net 30       TC7777
7131    P3087a  05-29-19 20     Net 60       PS1372
7131    P3087a  05-29-19 25     Net 60       PS2106
7131    P3087a  05-29-19 15     Net 60       PS3333
7131    P3087a  05-29-19 25     Net 60       PS7777

MS SQL Archives

Latest Forum Threads
MS SQL Forum
Topic By Replies Updated
SQL 2005: SSIS: Error using SQL Server credentials poverty 3 August 17th, 07:43 AM
Need help changing table contents nkawtg 1 August 17th, 03:02 AM
SQL Server Memory confifuration bhosalenarayan 2 August 14th, 05:33 AM
SQL Server Primary Key and a Unique Key katty.jonh 2 July 25th, 10:36 AM