Rob Gravelle lists his top 10 MySQL best practices for database administrators,
architects, developers, and security personnel.
Many groups and individuals are involved in the field of data management,
from MySQL administrators, architects, developers, as well as infrastructure
support people. Each of these plays a part in the security, maintenance, and
performance of a MySQL installation. Therefore, when speaking of best
practices, one has to consider which of these functions that specific practice
pertains to. In this top 10 list, I will try to include a bit from each
discipline. I have considered my own experiences as well as consulted with
numerous other sources to compile this final list. Whether or not you agree
with each and every item and their ordering, the important thing is to at least
consider each of the points raised here today. So, without further ado, here is
my personal top 10 list.
1. Index Search Fields
You should always index columns that you plan on searching on. Creating an
index on a field in a table creates another data structure which holds the
field value, and pointer to the record it relates to. This index structure is
then sorted, allowing binary searches to be performed on it. An index can be
defined for a single column or multiple columns of a given table.
This rule also applies to fields where partial string searches will be
performed on the start of the field. For instance the phrase "last_name
LIKE 'rob%'" will use the index, whereas "WHERE last_name LIKE
'%ert%'" will not.
This does not imply that the more indexes you have, the better. While that
is true to a point, keep in mind that every index takes up disk space. In a
MyISAM database, the index file can quickly reach the size limits of the
underlying file system if many fields within the same table are indexed.
2. Avoid Using "SELECT *" in your Queries
As a general rule, the more data is read from the tables, the slower a query
becomes. Considering that some production tables may contains dozens of
columns, some of which comprises of huge data types, it would be foolhardy to
select all of them. A database server which is separate from the web server will
only aggravate this issue, due to the data having to be transferred across the
network.
To reiterate, it is a good habit to always specify which columns you need
when writing your SELECT statements.
3. Set a password for the "root" user and then rename the user.
Here's a security tip. Much like with UNIX, the first thing you should do
with a clean MySQL install is set a password for the root user:
$ mysqladmin -u root password NEWPASSWORD
Even better, once you've set the password, change the name of the "root"
user to something else. A hacker on a MySQL server will likely target the root,
both for its superuser status and because it is a known user. By changing the
name of the root user, you make it all the more difficult for hackers to succeed
using a brute-force attack. Use the following series of commands to rename the
"root" user:
$ mysql -u root -p
mysql> use mysql;
mysql> update user set password=PASSWORD("NEWPASSWORD") where
User='<userid>';
mysql> flush privileges;
mysql> quit
4. Tune your Queries with EXPLAIN
The EXPLAIN keyword is undoubtedly the most instructive analytical tool in
the MySQL arsenal. Using it can give you valuable insight on the steps that
MySQL is taking to execute your query. This can help you spot the bottlenecks
and other problems with your query or table structures.
In my Optimizing
MySQL Query Retrieval Speed through Table Joins article, I included how to
use EXPLAIN to ascertain the efficiency of the table joins in the following
query statement:
explain
select a.au_lname,
a.au_fname,
ta.royaltyper,
t.title,
t.royalty
from authors a,
titleauthor ta,
titles t
where a.au_id = ta.au_id
and ta.title_id = t.title_id
EXPLAIN produced the following results:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ta ALL (NULL) (NULL) (NULL) (NULL) 800
1 SIMPLE t ref NewIndex NewIndex 23 crosstab_article.ta.title_id 100 Using where
1 SIMPLE a ALL (NULL) (NULL) (NULL) (NULL) 1000 Using where;
Using join buffer
The results of an EXPLAIN query will show you which indexes are being
utilized, how the table is being scanned and sorted, and other useful
information. At the very least, you can be sure that the lower the numbers
appear in the rows column, the faster the query should run.
5. Index and Use Same Column Types for Joins
If your query contains many joins, you need to make sure that the columns
that make up a join are indexed on both tables. This will allow MySQL to better
optimize the join operation.
Likewise, the columns that are joined must share the same type. For
instance, if you join a DECIMAL column to one of the type INT, MySQL will be
unable to use at least one of the indexes. String type columns must also use
the same character encoding.
6. LIMIT 1 When Getting a Unique Row
There are some queries that are meant to only return one row, such as those
which fetch a unique record, or that verify whether or not there are any
records that satisfy the WHERE clause. In such cases, adding LIMIT 1 to your
query can increase performance. This reduces execution time because the
database engine will stop scanning for records after it finds the first
matching record, instead of going through the whole table or index.
A second popular usage is in subqueries. In the following SELECT statement,
we want to retrieve the first s2 field value sorted by the s1 column. We can
then match it against the outer query values:
SELECT * FROM t1 WHERE s1 = (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1);
7. Hide MySQL from the Internet
Most experienced database administrators and security personnel know to
never host the database under the Web server's root. For Web-enabled
applications, MySQL should be hidden behind a firewall and communication should
only be enabled between application servers and your Web servers. Another
option is to use MySQL's skip-networking option. When it is enabled,
MySQL only listens for local socket connections and ignores all TCP ports.
8. Use the Smallest Data Types Possible
This would seem to be common sense to me, because of my programming
background. When I was attending college, there was still some of the
"memory is scarce" philosophy carried over from the days of 256 MB
hard drives. Nowadays, no one seems to care one iota about memory or hard drive
space. "Memory is cheap!" is the new adage. While it is true in
dollar terms, it still takes longer to read in large data types than smaller
ones, as the former require more disk sectors to be read into memory.
The moral of the story is to ignore the temptation to immediately jump the
largest data type when designing your tables. Consider using an int
rather than a bigint. You should also avoid large char(255)
text fields when a varchar or smaller char() will suffice.
Using the right data type will fit more records in memory or index key block,
meaning fewer reads, ergo faster performance.
9. Create Views to Simplify Commonly-used Table Joins
As discussed in my Writing
Reusable Queries in MySQL article, views help to both simplify complex
schemas and to implement security. One way that views contribute to security is
to hide auditing fields from developers. They can also be used to filter out
unindexed columns, leaving only fields that are fastest to search on. The only
caveat to using this technique is that you must be fairly sure that you won't
need to access one of the hidden table columns in the future; not an easy thing
to do!
10. Take Advantage of Query Caching
The query cache stores the text of a SELECT statement together with the
corresponding result set. If an identical statement is received later, the
server retrieves the results from the query cache rather than parsing and
executing the statement again. The query cache is shared among sessions, so a
result set generated by one client can be sent in response to the same query
issued by another client. Most MySQL servers have query caching enabled by
default. It's one of the most effective methods of improving performance.
It's a beautiful thing, but query caching isn't without limitations. Take
the following statement:
SELECT emp_id,
bonus_id
FROM bonuses
WHERE YEAR(award_date) = Year(CURDATE());
The problem here is that queries, which contain certain non-deterministic
functions - that is those which MySQL cannot calculate in advance - like NOW()
and RAND() are not cached. Fortunately, there is an easy fix to prevent this
from happening. That is to store the function results to a variable:
SET @year = Year(CURDATE());
SELECT emp_id,
bonus_id
FROM bonuses
WHERE YEAR(award_date) = @year;
And that's my personal top 10 best practices in MySQL. One thing that I
noticed while conducting my research for this article is that there is a wide
range of opinions as to which practices should be worthy of a spot in the top
10. In my estimation, there are many factors that may affect an item's weight
for you, including your particular role, experience, line of business, software
versions, and hardware configuration, to name but a few. I would welcome you to
add your own contributions as comments. Every tip helps!
»
See All Articles by Columnist
Rob Gravelle