MySQL Data Fragmentation – What, When and How

MySQL tables, including MyISAM and InnoDB, two of the most common types, experience fragmentation as data is inserted and deleted randomly. Fragmentation can leave large holes in your table, blocks which must be read when scanning the table. Optimizing your table can therefore make full table scans and range scans more efficient.

 

Fragmentation – an example

MySQL has quite a few different storage engines to store data in tables. Whenever MySQL deletes rows from your table, the space left behind is then empty. Over time with a lot of DELETEs, this space can grow larger than the used space in your table. When MySQL goes to scan that data, it scans to the high water mark of the table, that is the highest point at which data has been added. If new inserts occur, MySQL will try to use that space, but nevertheless gaps will persist.

This extra fragmented space can make reads against the table less efficient than they might otherwise be. Let’s look at an example.

We’ll create a database (sometimes called a schema) and a test table:

 



(root@localhost) [test]> create database frag_test;
Query OK, 1 row affected (0.03 sec)

(root@localhost) [test]> use frag_test;
Database changed

(root@localhost) [frag_test]> create table frag_test (c1 varchar(64));
Query OK, 0 rows affected (0.05 sec)

Next let’s add some rows to the table:

 


(root@localhost) [frag_test]> insert into frag_test values ('this is row 1');
Query OK, 1 row affected (0.01 sec)

(root@localhost) [frag_test]> insert into frag_test values ('this is row 2');
Query OK, 1 row affected (0.00 sec)

(root@localhost) [frag_test]> insert into frag_test values ('this is row 3');
Query OK, 1 row affected (0.00 sec)

Now we’ll check for fragmentation:

 


(root@localhost) [frag_test]> show table status from frag_testG;
*************************** 1. row ***************************
Name: frag_test
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 3
Avg_row_length: 20
Data_length: 60
Max_data_length: 281474976710655
Index_length: 1024
Data_free: 0
Auto_increment: NULL
Create_time: 2011-02-23 14:55:27
Update_time: 2011-02-23 15:06:55
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)

Now let’s delete a row and check again:

 


(root@localhost) [frag_test]> delete from frag_test where c1 = 'this is row 2';
Query OK, 1 row affected (0.00 sec)

(root@localhost) [frag_test]> show table status from frag_testG;
*************************** 1. row ***************************
Name: frag_test
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 2
Avg_row_length: 20
Data_length: 60
Max_data_length: 281474976710655
Index_length: 1024
Data_free: 20
Auto_increment: NULL
Create_time: 2011-02-23 14:55:27
Update_time: 2011-02-23 15:07:49
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)

Notice the “data_free” column shows the space left by the second row that we deleted. Imagine you had 20,000 rows. They would take 400k bytes of space. Now if you deleted 19,999 rows, there are 20bytes of useful space in the table, but MySQL will still read 400k and data_free will show 39980.

 

Eliminating fragmentation

Luckily MySQL comes with a simple way to fix this once you’ve identified it. It’s called optimize table. Take a look:

 


(root@localhost) [frag_test]> optimize table frag_test;
+---------------------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+---------------------+----------+----------+----------+
| frag_test.frag_test | optimize | status | OK |
+---------------------+----------+----------+----------+
1 row in set (0.00 sec)

(root@localhost) [frag_test]> show table status from frag_testG;
*************************** 1. row ***************************
Name: frag_test
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 2
Avg_row_length: 20
Data_length: 40
Max_data_length: 281474976710655
Index_length: 1024
Data_free: 0
Auto_increment: NULL
Create_time: 2011-02-23 14:55:27
Update_time: 2011-02-23 15:11:05
Check_time: 2011-02-23 15:11:05
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)

Performance considerations

OPTIMIZE TABLE will lock the entire table while doing its work. For small tables this will work fine because the whole table can be read and written quickly. For very large tables, this can take a very long time and interrupt or reduce available to your application. What to do?

Luckily MySQL has a great feature called Master-Master replication. In this configuration, your backend database is actually two separate databases, one active and one passive. Both of the databases are identical in every way. To perform the operations online – including the OPTIMIZE TABLE operation, simply run them on your PASSIVE database. This will not interrupt your application one iota. Once the operation has completed, switch roles so that your application is pointing to your secondary database and make it active. Then make or original database the passive one.

Now that the roles are switched and the application is happily pointing at db2, perform the same OPTIMIZE TABLE operation on db1. It’s now the passive database, so it won’t interrupt the primary either.

 

Other commands

Show all the fragmented tables in your database:

 


(root@localhost) [(none)]>
select table_schema, table_name, data_free, engine
from information_schema.tables where table_schema
not in ('information_schema', 'mysql') and data_free > 0;
+--------------+-----------------------------+-----------+--------+
| table_schema | table_name | data_free | engine |
+--------------+-----------------------------+-----------+--------+
| aitc | wp_comments | 346536 | MyISAM |
| aitc | wp_options | 64308 | MyISAM |
| aitc | wp_postmeta | 124 | MyISAM |
| cactidb | poller_item | 160 | MyISAM |
| cactidb | poller_output | 384 | MyISAM |
| drupal | sessions | 30976 | MyISAM |
| drupal | users | 92 | MyISAM |
| drupal | variable | 20 | MyISAM |
| gg | wp_comments | 232 | MyISAM |
| gg | wp_options | 696 | MyISAM |
| gg | wp_postmeta | 560 | MyISAM |
| ihi | wp_comments | 536 | MyISAM |
| ihi | wp_options | 444 | MyISAM |
| ihi | wp_postmeta | 288 | MyISAM |
| ihi | wp_redirection_items | 1292 | MyISAM |
| ihi | wp_redirection_logs | 140352 | MyISAM |
| nds | wp_comments | 4704 | MyISAM |
| nds | wp_options | 150580 | MyISAM |
| nds | wp_postmeta | 76 | MyISAM |
| oos | wp_comments | 317124 | MyISAM |
| oos | wp_options | 88196 | MyISAM |
| oos | wp_postmeta | 76 | MyISAM |
| phplist | phplist_listuser | 252 | MyISAM |
| phplist | phplist_sendprocess | 52 | MyISAM |
| phplist | phplist_user_user | 32248 | MyISAM |
| phplist | phplist_user_user_attribute | 120 | MyISAM |
| phplist | phplist_user_user_history | 288 | MyISAM |
| phplist | phplist_usermessage | 1428 | MyISAM |
| pn_nds | nuke_session_info | 12916 | MyISAM |
| psa | exp_event | 10024 | MyISAM |
| test | active_sessions | 30144 | MyISAM |
+--------------+-----------------------------+-----------+--------+
31 rows in set (0.26 sec)

You can also cause a table to be defragmented if you change the storage engine. That’s because MySQL has to read the entire table, and write it back to disk using the new storage engine, and in the process the rows and data gets compressed down more efficiently.

Here’s what that looks like:

 


(root@localhost) [frag_test]> alter table frag_test engine = innodb;
Query OK, 2 rows affected (0.17 sec)
Records: 2 Duplicates: 0 Warnings: 0

(root@localhost) [frag_test]> show table status from frag_test
-> G;
*************************** 1. row ***************************
Name: frag_test
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 2
Avg_row_length: 8192
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2011-02-23 15:41:12
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment: InnoDB free: 7168 kB
1 row in set (0.00 sec)

Conclusion

Fragmentation is something to keep an eye on but not necessarily to obsess over. If you see a few tables with a very large data_free value, it can be worthwhile to OPTIMIZE those, as read performance on the table will improve afterward.

See all articles by Sean Hull

Sean Hull
Sean Hull
Sean Hull is a business and technology consultant, author and speaker. He is the founder and senior consultant at Heavyweight Internet Group in New York and has worked as a database consultant and expert for ten years. He is the author of "Oracle and Open Source" on O'Reilly & Associates and his articles have appeared in Oracle's own OTN website, DBA Zine, ChangeThis.com, and many others. Using techniques, technologies, and perspectives discussed in the book, articles, and seminars, he has taken his career into exciting areas. He has served as Technology Consultant, Project Manager, CEO, and mentor with a background in programming, systems integration & administration, project development and management. He can be reached at shull@iheavy.com or visit http://www.iheavy.com for more info about consulting services.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles