Most systems need to be highly optimized for selects - take a news site
which performs millions of queries per day, but where the data arrives in
large batches of text files. So for parts of the day, inserts need to be
optimal, without noticeably affecting the millions trying to access the
data...
Assuming a nicely formatted '|' delimited text file that we want to
insert into the table above, take this piece of PHP code:
if (!($fp = fopen("datafile.txt","r"))) { // open the file for reading
print "\nUnable to open datafile.txt for writing"; // display error
exit(); // end the running of the program
}
while (!feof ($fp)) { // loop through the file line by line
$sline = fgets($fp, 4096); // put the data into the variable $sline
$sline = chop($sline); // remove the newline
list($eno,$fname,$sname,$telno,$salary) = split("|",$code);
// split the line on "|", populating the ind. variables
$db->query("insert into employee(employee_number,firstname,surname,
tel_no, salary
values($eno,'$fname','$sname','$tel_no', $salary)");
} // end while loop
This would work, but would be very slow. The index buffer would be flushed
after every insert. Until recently, MyISAM tables (The MySQL default)
did not allow data to be inserted at the same time as being read. The new
format does, but only if there are no deleted records present (highly
unlikely in a heavily used system). So the entire table is locked for the
duration of each insert. Fine on a low volume site, but when you're
getting hundreds or thousands of queries per second, you'll soon notice
the backlog!
There's a solution however - the best way to insert the data is to use
MySQL's "LOAD DATA INFILE". This is much faster (20 times
according to MySQL), and the only way some systems I've seen are still
hanging in there!
The syntax is simple, and the code becomes a lot simpler too:
$db->query("LOAD DATA INFILE 'datafile.txt' INTO TABLE employee (employee_number,firstname,surname,tel_no,salary) FIELDS TERMINATED BY '|'");
LOAD DATA INFILE has defaults of:
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
if you don't specify any of these clauses. And, just as with an ordinary
insert, you need to specify a field list if the order of the fields is
different, or, as in the example above, you're not inserting data for every
field. Always specifying a field list is good practice for all queries
anyway - if someone adds a field to the table at a later stage, you don't
want to go back and have to fix all your previous INSERT and SELECT *
statements.
If you can't get this to work properly, have a look at the format of your
text file - every problem I've seen with LOAD DATA has been because of a
corrupted text file. Every field in every row must be delimited correctly!
You may not always be inserting from a text file - perhaps your application
needs to do many unrelated inserts continually. There are ways to make sure
the mass of users selecting are not badly affected... The first is to use
INSERT LOW PRIORITY. This waits until there are no more reads waiting to
happen, waiting for the gap, and not pushing in as it were. Of course, if
your database is a rush hour special, there may never be a gap, and the
client performing the INSERT LOW PRIORITY may start to grow cobwebs! An
alternative here is INSERT DELAYED. The client is immediately freed, and
the insert put into a queue (with all the other INSERT DELAYED's still
waiting for the queue to end). This means that there can be no meaningful
information passed back to the client, (such as the auto_increment value),
as the INSERT has not been processed when the client is freed. Also, be
aware that a catastrophe such as an unexpected power failure here will
result in the queued INSERT's being lost. For neither of these methods
do you have any idea when the data will be inserted, if at all, so I
suggest you use with caution.
Conclusion
It's not only getting the data in that needs to be quick - sometimes you need to get it out quickly too. (Say you've accidentally loaded yesterday's classified ads, for example). Don't do a:
DELETE FROM classifieds;
Rather, use:
TRUNCATE TABLE classifieds;
The difference here is that DELETE drops records one by one, and that can be 1 million one by one's too slow! Note that this does not apply before version 4.0 of MySQL. At time of writing, most of you will still be using 3.x versions (if you do a DELETE FROM tablename on a non-empty table, and get 0 records back as a result, you're running an earlier version. To fix this problem, MySQL made DELETE remove records one by one so as to return the number of records deleted, but TRUNCATE still did the quick delete. Also, earlier versions than 3.23.33 used TRUNCATE tablename, not TRUNCATE TABLE tablename)
This has only been a brief introduction to optimizing queries and indexes. Once you've mastered these tips, you'll want to look at your more complex joins that just never seem to perform quickly. The MySQL site has lots of useful information, and there are a few good books out there as well. But don't forget EXPLAIN! Often the best way is to try and rewrite the query in as many different ways as possible, and see which one runs more efficiently. You can learn a lot by trying to figure out why one alternative ran faster than the other. Good luck!
resource list: www.mysql.com - complete documentation for MySQL
Add databasejournal.com to your favorites Add databasejournal.com to your browser search box IE 7 | Firefox 2.0 | Firefox 1.5.xReceive news via our XML/RSS feed