Database Journal
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Tips Database Forum

» Database Journal Home
» Database Articles
» Database Tutorials
MS Access
SQL Scripts & Samples
» Database Forum
» Slideshows
Free Newsletters:

News Via RSS Feed

Database Journal |DBA Support |SQLCourse |SQLCourse2

Featured Database Articles


Posted May 7, 2003

Deleting Duplicate Rows in a MySQL Database - Page 3

By Ian Gilfillan

Care when using Temporary Tables

Be careful when using temporary tables in this way. With our tiny data set, there is not much of a problem, but huge tables could cause problems with memory. Temporary tables can be of type HEAP (as in the example), MyISAM, Merge or InnoDB (or even the old ISAM format). HEAP tables are placed in memory, in which case you need to make sure you have enough memory available to store the entire table, which may not always be the case with large production tables. Other table types are placed on disk, in which case you may as well not make the table temporary. There iss always the risk of losing your connection at the vital point, just after you've deleted the original data, and therefore losing your temporary table as well (and assuming you weren't using transactions, the changes are permanent). Simply remove the word TEMPORARY above, and you create a duplicate 'permanent' table. Of course, you'll want to delete it afterwards. Here is the full set of statements to achieve this:

## Necessary if you're using the same connection as the previous example
DROP TABLE bad_temp;

bad_temp(id INT, name VARCHAR(20));

INSERT INTO bad_temp(id,name) SELECT DISTINCT id,name FROM bad_table;

DELETE FROM bad_table;

INSERT INTO bad_table(id,name) SELECT id,name FROM bad_temp;

DROP TABLE bad_temp;

Duplicates with Unique Keys

The example above assumed the entire record was a duplicate. But there's another kind of duplicate, where there is a unique key, perhaps with an auto_increment field, yet the data itself is duplicated. As far as MySQL is concerned, there are no duplicates, yet logically there are. Let's look at an example, and add some more test data:

CREATE TABLE bad_table2 (

INSERT INTO bad_table2(id,name) VALUES
(1,'Things Fall Apart'),
(2,'Things Fall Apart'),
(3,'The Famished Road'),
(4,'Things Fall Apart'),
(5,'The Famished Road'),
(6,'Thirteen cents'),
(7,'Thirteen cents');

Here you need to make a decision. Is the id field of any value to you here? If it is, (because you've got links to other tables, with, for example, both id 1 and id 2, then you've got quite a nightmare on your hands - you can't lose either of the values, because you still need the link. In this case, you are going to have to write some code to resolve the mess, checking for duplicates, running update statements and the like. I will assume your data is not quite that much of a disaster, and that the id does not matter. In this case, you are simply going to let MySQL's auto increment field take care of the problem with the id field. The only difference in this example is that you cannot select a DISTINCT id-name combination; rather you only use the DISTINCT on the name.

bad_temp2(id VARCHAR(10), name VARCHAR(20))

INSERT INTO bad_temp2(name) SELECT DISTINCT name FROM bad_table2;

DELETE FROM bad_table2;

INSERT INTO bad_table2(id,name) SELECT id,name FROM bad_temp2;

Now that you know how to extricate yourself from the mess, don't think it's OK to get into this situation! With good planning and careful database design, you should never have to do this sort of thing. But now you know, just in case 'someone else' mangles your data. Good luck!

» See All Articles by Columnist Ian Gilfillan

MySQL Archives

Comment and Contribute


(Maximum characters: 1200). You have characters left.



Latest Forum Threads
MySQL Forum
Topic By Replies Updated
MySQL in high availability and transction secure banking application klamor 1 August 28th, 10:24 AM
MySQL rollback UAL225 1 August 28th, 10:15 AM
Browsing a DB file that uses MySql finleytech 1 July 26th, 10:51 AM
php cookie won't pass variable to next page ITdevGirl 0 June 22nd, 12:13 PM