Free Newsletters:
DatabaseDaily  
Database Journal
Search Database Journal:
 
MS SQL Oracle DB2 Access MySQL PostgreSQL Sybase PHP SQL Etc SQL Scripts & Samples Links Database Forum DBA Videos
internet.com

» Database Journal Home
» DBA Videos
» Database Articles
» Database Tutorials
MS SQL
Oracle
MS Access
MySQL
DB2
» RESOURCES
Database Tools
SQL Scripts & Samples
Links
» Database Forum
» DBA Jobs
» Sitemap

News Via RSS Feed



follow us on Twitter

Marketplace Partners
Be a Marketplace Partner

internet.commerce
Be a Commerce Partner


















Security Software Primed for Strong Growth

SAP Touts 'Unwired' Strategy With Sybase

Salesforce Q2 Sees SaaS Paying Off

internet.com
IT
Developer
Internet News
Small Business
Personal Technology

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers


Database Journal | DBA Support | SQLCourse | SQLCourse2









Web Developer/Designer
Targus
US-CA-Anaheim

Justtechjobs.com Post A Job | Post A Resume

Featured Database Articles

MySQL

May 6, 2003

Deleting Duplicate Rows in a MySQL Database

By Ian Gilfillan

Care when using Temporary Tables

Be careful when using temporary tables in this way. With our tiny data set, there is not much of a problem, but huge tables could cause problems with memory. Temporary tables can be of type HEAP (as in the example), MyISAM, Merge or InnoDB (or even the old ISAM format). HEAP tables are placed in memory, in which case you need to make sure you have enough memory available to store the entire table, which may not always be the case with large production tables. Other table types are placed on disk, in which case you may as well not make the table temporary. There iss always the risk of losing your connection at the vital point, just after you've deleted the original data, and therefore losing your temporary table as well (and assuming you weren't using transactions, the changes are permanent). Simply remove the word TEMPORARY above, and you create a duplicate 'permanent' table. Of course, you'll want to delete it afterwards. Here is the full set of statements to achieve this:


## Necessary if you're using the same connection as the previous example
DROP TABLE bad_temp;

CREATE TABLE 
bad_temp(id INT, name VARCHAR(20));

INSERT INTO bad_temp(id,name) SELECT DISTINCT id,name FROM bad_table;

DELETE FROM bad_table;

INSERT INTO bad_table(id,name) SELECT id,name FROM bad_temp;

DROP TABLE bad_temp;

Duplicates with Unique Keys

The example above assumed the entire record was a duplicate. But there's another kind of duplicate, where there is a unique key, perhaps with an auto_increment field, yet the data itself is duplicated. As far as MySQL is concerned, there are no duplicates, yet logically there are. Let's look at an example, and add some more test data:

CREATE TABLE bad_table2 (
id INT NOT NULL UNIQUE AUTO_INCREMENT,
name VARCHAR(20) NOT NULL 
);

INSERT INTO bad_table2(id,name) VALUES
(1,'Things Fall Apart'),
(2,'Things Fall Apart'),
(3,'The Famished Road'),
(4,'Things Fall Apart'),
(5,'The Famished Road'),
(6,'Thirteen cents'),
(7,'Thirteen cents');

Here you need to make a decision. Is the id field of any value to you here? If it is, (because you've got links to other tables, with, for example, both id 1 and id 2, then you've got quite a nightmare on your hands - you can't lose either of the values, because you still need the link. In this case, you are going to have to write some code to resolve the mess, checking for duplicates, running update statements and the like. I will assume your data is not quite that much of a disaster, and that the id does not matter. In this case, you are simply going to let MySQL's auto increment field take care of the problem with the id field. The only difference in this example is that you cannot select a DISTINCT id-name combination; rather you only use the DISTINCT on the name.

CREATE TEMPORARY TABLE 
bad_temp2(id VARCHAR(10), name VARCHAR(20))
TYPE=HEAP;

INSERT INTO bad_temp2(name) SELECT DISTINCT name FROM bad_table2;

DELETE FROM bad_table2;

INSERT INTO bad_table2(id,name) SELECT id,name FROM bad_temp2;

Now that you know how to extricate yourself from the mess, don't think it's OK to get into this situation! With good planning and careful database design, you should never have to do this sort of thing. But now you know, just in case 'someone else' mangles your data. Good luck!

» See All Articles by Columnist Ian Gilfillan



Go to page: Prev  1  2  3  

Tools:
Add databasejournal.com to your favorites
Add databasejournal.com to your browser search box
IE 7 | Firefox 2.0 | Firefox 1.5.x
Receive news via our XML/RSS feed

MySQL Archives








Latest Forum Threads
MySQL Forum
Topic By Replies Updated
Show columns help ScottDolan 0 June 24th, 11:37 AM
MySQL update query help rahul123 0 June 4th, 10:34 AM
Select ..FROM with date Katrin 0 May 28th, 10:47 AM
Normalize on the fly? RickW 0 May 21st, 02:02 PM