Deleting duplicate records when multiple duplicates exist
Of course, not all of you are going to be so lucky as to only have one duplicate, and manually removing them one by one is hardly an answer. First add some more duplicates to the table, and then we will see what we can do:
INSERT INTO bad_table(id,name) VALUES
(1,'Things Fall Apart'),
(2,'The Famished Road'),
(3,'Thirteen cents'),
(3,'Thirteen cents');
Some of you may have read the article on Database Journal about removing duplicates, specifically for SQL Server databases (if you've stumbled across this article as a SQL Server user, you can find it here. Unfortunately none of the techniques mentioned in that article work with MySQL. You cannot use cursors, correlated subqueries or derived tables. But you can
use temporary tables, although the way they work in MySQL is different to
that which is described in that article.
So, here's how you do it - first create a temporary table with the same structure as your existing table (MySQL 4.1 has a great feature - the LIKE keyword that allows you to create a new table based upon the structure of an existing table, but since it's only an alpha release most of you won't be running that yet).
CREATE TEMPORARY TABLE
bad_temp(id INT,name VARCHAR(20))
TYPE=HEAP;
Note that a temporary table only exists for as long as the connection. Therefore, if you are using a web application such as PHPMyAdmin to test these SQL statements, you'll need to bundle the above statement with the next few, to ensure the temporary table is available all of the time (each web page in a browser-based tool such as PHPMyAdmin uses its own connection). Also, see the cautionary after the example for some things you need to be aware of when using this technique on a production server.
INSERT INTO bad_temp(id,name) SELECT DISTINCT id,name FROM bad_table;
[[Authors note: A reader has pointed out that I went out about this in
a rather long-winded way. You can just as easily have used a single
statement:
CREATE TEMPORARY TABLE bad_temp AS SELECT DISTINCT * FROM bad_table
Thanks to P-A Fredriksson for pointing this out!]]
Now the temporary table contains a set of data with no duplicates. The
DISTINCT keyword takes care of that. Now, delete all of the data from the
original (and read the cautionary below first when trying this on a
production server!), and populate it from the temporary table, as follows:
DELETE FROM bad_table;
INSERT INTO bad_table(id,name) SELECT id,name FROM bad_temp;