Deleting Duplicate Rows in a MySQL Database
May 6, 2003
Something is not right. Your beautifully designed application is behaving oddly. After some investigation, you discover a problem in the code logic, and realise the database table contains duplicate rows. And now you have to delete them. Before I show you how to do this in MySQL, take a step back. Why does your table contain duplicates? The answer in the vast majority of cases is that your table has been badly designed. There is no one else to blame but the database designer. So before you fix the symptom (and I know it's urgent and I'm getting there), you need to think about fixing the cause of the problem. All tables should have a unique primary key. Not having one is an exception that is usually not wise. I suggest you read some more about the topic of database normalization, and follow the steps to correct the structure of your database. A normalized database almost never has this kind of data integrity problems. All too many books and tutorials dive straight into providing novices with the tools needed to create database tables without teaching the fundamentals of good database design. However, I'll get off my soapbox now and get to the topic at hand. What to do about it when it happens to you. First, let's create our sample badly-designed table, and populate it with some data.
CREATE TABLE bad_table ( id INT NOT NULL, name VARCHAR(20) NOT NULL );
Notice the table has no primary key. Alarm bells should be ringing. You should either have defined the id field as a PRIMARY KEY (perhaps even as an AUTO_INCREMENT as well), or simply as UNIQUE. Here is a suggestion for how the table should have looked:
CREATE TABLE good_table ( id INT NOT NULL, name VARCHAR(20) NOT NULL, PRIMARY KEY (id) );
Back to the bad table - let's insert some values, with duplicates:
INSERT INTO bad_table(id,name) VALUES (1,'Things Fall Apart'), (1,'Things Fall Apart'), (2,'The Famished Road');
Deleting a single record when duplicates exist
I hope that you would not have done such a thing using raw SQL, and can at least blame the application, but nevertheless, you're stuck with duplicate records and need to remove them. In some cases there may only be one duplicate, as in the sample data we have created. In this case, the simplest way is just to delete that one record. Many people do not know about a handy way to do this. Of course the statement:
DELETE FROM bad_table WHERE id=1;
would remove both records with an id of 1, and leave you having to INSERT the record again. But, do you remember the LIMIT clause? It is not only useful in SELECT statements, but in any kind of statement too, including DELETEs. Therefore, the most efficient way to solve the problem would be to use the following:
DELETE FROM bad_table WHERE id=1 LIMIT 1;
You can see now that you are still left with two records:
mysql> SELECT * FROM bad_table; +----+-------------------+ | id | name | +----+-------------------+ | 1 | Things Fall Apart | | 2 | The Famished Road | +----+-------------------+
Simple and efficient.