Rob Gravelle lists his top 10 MySQL best practices for database administrators, architects, developers, and security personnel.
Many groups and individuals are involved in the field of data management, from MySQL administrators, architects, developers, as well as infrastructure support people. Each of these plays a part in the security, maintenance, and performance of a MySQL installation. Therefore, when speaking of best practices, one has to consider which of these functions that specific practice pertains to. In this top 10 list, I will try to include a bit from each discipline. I have considered my own experiences as well as consulted with numerous other sources to compile this final list. Whether or not you agree with each and every item and their ordering, the important thing is to at least consider each of the points raised here today. So, without further ado, here is my personal top 10 list.
1. Index Search Fields
You should always index columns that you plan on searching on. Creating an index on a field in a table creates another data structure which holds the field value, and pointer to the record it relates to. This index structure is then sorted, allowing binary searches to be performed on it. An index can be defined for a single column or multiple columns of a given table.
This rule also applies to fields where partial string searches will be performed on the start of the field. For instance the phrase "last_name LIKE 'rob%'" will use the index, whereas "WHERE last_name LIKE '%ert%'" will not.
This does not imply that the more indexes you have, the better. While that is true to a point, keep in mind that every index takes up disk space. In a MyISAM database, the index file can quickly reach the size limits of the underlying file system if many fields within the same table are indexed.
2. Avoid Using "SELECT *" in your Queries
As a general rule, the more data is read from the tables, the slower a query becomes. Considering that some production tables may contains dozens of columns, some of which comprises of huge data types, it would be foolhardy to select all of them. A database server which is separate from the web server will only aggravate this issue, due to the data having to be transferred across the network.
To reiterate, it is a good habit to always specify which columns you need when writing your SELECT statements.
3. Set a password for the "root" user and then rename the user.
Here's a security tip. Much like with UNIX, the first thing you should do with a clean MySQL install is set a password for the root user:
$ mysqladmin -u root password NEWPASSWORD
Even better, once you've set the password, change the name of the "root" user to something else. A hacker on a MySQL server will likely target the root, both for its superuser status and because it is a known user. By changing the name of the root user, you make it all the more difficult for hackers to succeed using a brute-force attack. Use the following series of commands to rename the "root" user:
$ mysql -u root -p
mysql> use mysql;
mysql> update user set password=PASSWORD("NEWPASSWORD") where
mysql> flush privileges;
4. Tune your Queries with EXPLAIN
The EXPLAIN keyword is undoubtedly the most instructive analytical tool in the MySQL arsenal. Using it can give you valuable insight on the steps that MySQL is taking to execute your query. This can help you spot the bottlenecks and other problems with your query or table structures.
In my Optimizing MySQL Query Retrieval Speed through Table Joins article, I included how to use EXPLAIN to ascertain the efficiency of the table joins in the following query statement:
from authors a,
where a.au_id = ta.au_id
and ta.title_id = t.title_id
EXPLAIN produced the following results:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ta ALL (NULL) (NULL) (NULL) (NULL) 800
1 SIMPLE t ref NewIndex NewIndex 23 crosstab_article.ta.title_id 100 Using where
1 SIMPLE a ALL (NULL) (NULL) (NULL) (NULL) 1000 Using where;
Using join buffer
The results of an EXPLAIN query will show you which indexes are being utilized, how the table is being scanned and sorted, and other useful information. At the very least, you can be sure that the lower the numbers appear in the rows column, the faster the query should run.
5. Index and Use Same Column Types for Joins
If your query contains many joins, you need to make sure that the columns that make up a join are indexed on both tables. This will allow MySQL to better optimize the join operation.
Likewise, the columns that are joined must share the same type. For instance, if you join a DECIMAL column to one of the type INT, MySQL will be unable to use at least one of the indexes. String type columns must also use the same character encoding.
6. LIMIT 1 When Getting a Unique Row
There are some queries that are meant to only return one row, such as those which fetch a unique record, or that verify whether or not there are any records that satisfy the WHERE clause. In such cases, adding LIMIT 1 to your query can increase performance. This reduces execution time because the database engine will stop scanning for records after it finds the first matching record, instead of going through the whole table or index.
A second popular usage is in subqueries. In the following SELECT statement, we want to retrieve the first s2 field value sorted by the s1 column. We can then match it against the outer query values:
SELECT * FROM t1 WHERE s1 = (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1);
7. Hide MySQL from the Internet
Most experienced database administrators and security personnel know to never host the database under the Web server's root. For Web-enabled applications, MySQL should be hidden behind a firewall and communication should only be enabled between application servers and your Web servers. Another option is to use MySQL's skip-networking option. When it is enabled, MySQL only listens for local socket connections and ignores all TCP ports.
8. Use the Smallest Data Types Possible
This would seem to be common sense to me, because of my programming background. When I was attending college, there was still some of the "memory is scarce" philosophy carried over from the days of 256 MB hard drives. Nowadays, no one seems to care one iota about memory or hard drive space. "Memory is cheap!" is the new adage. While it is true in dollar terms, it still takes longer to read in large data types than smaller ones, as the former require more disk sectors to be read into memory.
The moral of the story is to ignore the temptation to immediately jump the largest data type when designing your tables. Consider using an int rather than a bigint. You should also avoid large char(255) text fields when a varchar or smaller char() will suffice. Using the right data type will fit more records in memory or index key block, meaning fewer reads, ergo faster performance.
9. Create Views to Simplify Commonly-used Table Joins
As discussed in my Writing Reusable Queries in MySQL article, views help to both simplify complex schemas and to implement security. One way that views contribute to security is to hide auditing fields from developers. They can also be used to filter out unindexed columns, leaving only fields that are fastest to search on. The only caveat to using this technique is that you must be fairly sure that you won't need to access one of the hidden table columns in the future; not an easy thing to do!
10. Take Advantage of Query Caching
The query cache stores the text of a SELECT statement together with the corresponding result set. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client. Most MySQL servers have query caching enabled by default. It's one of the most effective methods of improving performance.
It's a beautiful thing, but query caching isn't without limitations. Take the following statement:
WHERE YEAR(award_date) = Year(CURDATE());
The problem here is that queries, which contain certain non-deterministic functions - that is those which MySQL cannot calculate in advance - like NOW() and RAND() are not cached. Fortunately, there is an easy fix to prevent this from happening. That is to store the function results to a variable:
SET @year = Year(CURDATE());
WHERE YEAR(award_date) = @year;
And that's my personal top 10 best practices in MySQL. One thing that I noticed while conducting my research for this article is that there is a wide range of opinions as to which practices should be worthy of a spot in the top 10. In my estimation, there are many factors that may affect an item's weight for you, including your particular role, experience, line of business, software versions, and hardware configuration, to name but a few. I would welcome you to add your own contributions as comments. Every tip helps!
» See All Articles by Columnist Rob Gravelle