In the majority of MySQL systems, the schema is always updated and changed based on the needs and focus of the application, making it somewhat difficult to keep up with all of the changes that come down the pipe. Chris Schneider shows you how to gauge the index usage in a system, using grep, awk and sed.
Knowing how to index a schema based on the queries using the system is a
great skill to have. In the majority of the systems I have worked on, the
schema is always updated and changed based on the needs and focus of the
application. That said, it can be somewhat difficult to keep up with all of the
changes that come down the pipe, especially if there are many changes every
week. To help identify the indexes that are being used there are a few tools
out there to help. One of the best resides in the MAAKIT called, mk-index-usage.
Unfortunately, I was working with an older version of the Maatkit and didn’t
realize that this tool had already been created then added to the already
stellar Maatkit. In this article, I will go through the process that I used to
gauge the index usage in a system I stepped into in my current position. I
thought this might be a good way to show that using grep, awk and sed can be
very helpful with just about anything.
The Basic Process
1. Set your long_query_time to zero (0) on the MySQL instance you want to
analyze
2. Let the file grow as long as you think is needed for a good analysis
3. Parse the file and run explain on the queries in the file
4. Record the usage of the indexes and some other statistics
5. Present to your team
A Bit More Detail
(root@localhost) [(none)]> set global long_query_time = 0; Query OK, 0 rows affected (0.00 sec)
I waited about 10 – 15 minutes or so to ensure that I had a good sample to
work with. Keep in mind that all I really needed were sample queries to gauge
if my indexes were in line with expectations. Basically, I want to take the
MySQL slow query log and turn it into just queries, leaving all the extra stuff
out. It’s really easy to parse the slow query log given its initial output; here
is an example of the MySQL Slow query log.
Example:
# Time: 100921 3:23:00 # User@Host: SOMEMYSQLUSER[SOMEMYSQLUSER] @ [127.0.0.1] # Query_time: 0.000280 Lock_time: 0.000128 Rows_sent: 1 Rows_examined: 0 SET timestamp=1285039380; SELECT COUNT(*) count FROM INFORMATION_SCHEMA.INNODB_TRX WHERE trx_state = 'RUNNING'; # User@Host: SOMEMYSQLUSER[SOMEMYSQLUSER] @ [127.0.0.1] # Query_time: 0.000101 Lock_time: 0.000058 Rows_sent: 1 Rows_examined: 0 SET timestamp=1285039380; SELECT COUNT(*) count FROM INFORMATION_SCHEMA.INNODB_TRX WHERE trx_state = 'BLOCKED'; # User@Host: SOMEMYSQLUSER[SOMEMYSQLUSER] @ [127.0.0.1] # Query_time: 0.000084 Lock_time: 0.000042 Rows_sent: 1 Rows_examined: 5 SET timestamp=1285039380; SELECT SUM(compress_time) compress_seconds, SUM(uncompress_time) uncompress_seconds FROM INFORMATION_SCHEMA.INNODB_CMP; # User@Host: SOMEMYSQLUSER[SOMEMYSQLUSER] @ [127.0.0.1]
Only 35 of the 55 tables were used during the
run.
The first step in parsing the file is to get rid of the lines that don’t
matter in this case. These lines start with #.
shell> cat /path/to/slow.log |grep -v "^#" > /path/your/working/in/slow1.log
You now need to take out more information that doesn’t matter. Just as an
example I have put in a few of the strings I really don’t want to run with
EXPLAIN or couldn’t run with EXPLAIN. A short list would be commands like
commit, rollback, insert, update, delete, replace…etc. Please note that
this will vary from system to system so you should take this as a general
guideline.
Shell> cat slow1.log | egrep -v "^commit|^SET|^SELECT @@session|^use|^select FOUND_ROWS|^rollback|^update|^delete|^replace|mysql-connector-|^SHOW" > slow2.log
Next you’ll need to remove all of the tabs from the beginning of the line,
remove the new lines from the end of the lines then replace the left over
semi-colon with a new line. This will give you one line per SELECT statement.
Shell> cat slow2.log |sed -e 's/^[ t]*/ /g' |tr 'n' ' ' | sed -e 's/;/n/g' > slow3.log
The last command is to add an EXPLAIN to the beginning of each line.
Shell> sed -i -e 's/^SELECT/EXPLAIN SELECT/g' slow3.log
Now that I have a file with the information that I need, I would like to
gather some information. All of the queries need to be run through EXPLAIN
inside of MySQL so we can identify the index the query is using. Below is a
simple script that will execute each line of the slow3.log we just created, run
explain and log the index used for each table in a file.
#!/bin/sh myfile="slow3.log" dir="indexes" database="mytestdb" mkdir -p ${dir} while read line do table=`mysql -uroot ${database} -e "${line}G" |grep "table:" |awk '{print $2}'` key=`mysql -uroot ${database} -e "${line}G" |grep "key:" |awk '{print $2}'` iffile=`ls -al ${dir}/${table}.txt |wc -l` if [ ${iffile} -eq 0 ]; then touch ${dir}/${table}.txt fi echo ${key} >> ${dir}/${table}.txt done < ${myfile}
After we have all of the files in the ./indexes directory we can run a few
commands that tell us a lot of information. Below, I want to know the overall
table usage and the index usage based on the table specifically.
Table Usage
Every time a table is accessed with a SELECT from the slow3.log it will add
a line to the corresponding tbl_X.txt file. Based on the file size alone it is
easy to tell what tables are being accessed the most.
NOTE: The higher the number the more the table
is used.
shell> ls -al |awk '{print $5" "$9}' |sort -n -r 248147 tbl_1.txt 219515 tbl_2.txt 160510 tbl_3.txt 51804 tbl_4.txt 33454 tbl_5.txt 27938 tbl_6.txt 26753 tbl_7.txt 13090 tbl_8.txt 11420 tbl_9.txt 11324 tbl_10.txt 7588 tbl_11.txt 4873 tbl_12.txt 4823 tbl_13.txt 3566 tbl_14.txt 1522 tbl_15.txt 1517 tbl_16.txt 743 tbl_17.txt 611 tbl_18.txt 531 tbl_19.txt 208 tbl_20.txt 169 tbl_21.txt 130 tbl_22.txt 117 tbl_23.txt 104 tbl_24.txt 98 tbl_25.txt 97 tbl_26.txt 91 tbl_27.txt 74 tbl_28.txt 65 tbl_29.txt 65 tbl_30.txt 65 tbl_31.txt 39 tbl_32.txt 26 tbl_33.txt 13 tbl_34.txt
Index usage
Next, I would like to see the indexes that were use and the frequency in
which they were used. Below is a simple script that will run counts on each
file and display the output in a readable fashion.
NOTE: the number next to the index is how many
times that index was used during the run.
shell> for x in `ls -al *.txt |awk '{print $9}' |awk -F. '{print $1}'`; do echo "TABLE: ${x}"; for y in `cat ${x}.txt | sort | uniq`; do cnt=`grep ${y} ${x}.txt |wc -l`; echo "${y}::${cnt}"; done; echo; done TABLE: tbl_8 _idx__author::895 _idx__my_slug::24 new::201 new2::155 PRIMARY::20 TABLE: tbl_11 _idx__author::5 new2::1503 PRIMARY::1 TABLE: tbl_7 _idx__author::1951 _idx__my_approved::8 _idx__my_slug::34 new3::8 new4::14 new5::42 new7::74 PRIMARY::10 TABLE: tbl_14 _idx__my_slug::245 PRIMARY::17 TABLE: tbl_2 _idx__author::877 _idx__author,_idx__my_attached_to::61 _idx__my_attached_to::8030 _idx__my_attached_to_approved::1731 _idx__my_attached_to_by_date::1312 new2::1001 new3::1636 PRIMARY::28 TABLE: tbl_17 _idx__author::43 PRIMARY::23 TABLE: tbl_33 _idx__author::2 TABLE: tbl_27 _idx__author::7 TABLE: tbl_10 _idx__author::7 _idx__mytbl_19_id::11 new2::2207 TABLE: tbl_34 _idx__author::1 TABLE: tbl_19 _idx__author::29 _idx__my_approved::6 _idx__my_slug::3 new::1 TABLE: tbl_13 _idx__author::371 TABLE: tbl_23 _idx__author::9 TABLE: tbl_3 _idx__author::5 _idx__mytbl_9_id::12 _idx__my_mozzle::73 _idx__mytbl_1name__lower::5489 _idx__mytbl_1name__lower_tbl_9_privacy::1380 new::199 new2::68 PRIMARY::34 TABLE: tbl_9 _idx__author::708 _idx__my_approved::10 new2::55 new3::293 PRIMARY::37 TABLE: tbl_31 _idx__author::5 TABLE: tbl_15 _idx__author::34 new::248 PRIMARY::11 TABLE: tbl_25 _idx__author::5 NULL::5 PRIMARY::1 TABLE: tbl_16 _idx__author::113 PRIMARY::6 TABLE: tbl_21 _idx__author::13 TABLE: tbl_18 _idx__author::5 _idx__my_slug::39 TABLE: tbl_26 _idx__author::5 PRIMARY::4 TABLE: tbl_6 _idx__author::996 _idx__my_approved::12 _idx__my_slug::168 new::2458 new3::73 new4::483 new8::1802 PRIMARY::29 TABLE: tbl_32 _idx__author::3 TABLE: tbl_22 _idx__author::10 TABLE: tbl_30 _idx__author::5 TABLE: tbl_20 _idx__author::16 TABLE: tbl_12 _idx__author::373 PRIMARY::3 TABLE: tbl_4 _idx__author::2667 _idx__mytbl_14_id::113 _idx__my_slug::714 new2::31 new3::13 new4::651 new5::213 PRIMARY::28 TABLE: tbl_24 _idx__author::8 TABLE: tbl_1 _idx__author::6620 _idx__id::171 _idx__my_birth_month_and_day::30 _idx__my_full_name::16 _idx__my_profile_address__lower::4489 _idx__my_xg__forum__activity__count::3 _idx__updated_date::13 new6::1264 new7::70 new_fullname_status_idx::335 newtbl_1_count_idx::7 PRIMARY::66 TABLE: tbl_29 _idx__author::5 TABLE: tbl_28 _idx__author::2 PRIMARY::6 TABLE: tbl_5 _idx__author::2342 _idx__my_approved::6 _idx__my_slug::40 new::450 new2::60 new3::376 PRIMARY::13
More About Tables
I also want to know the tables that I’m not using. It is very simple to compare
the table names in the files we created above to the information_schema
database and find the tables that were not used. Below is the output of the
table that I was not using during the run of the log file.
| tbl_a | | tbl_b | | tbl_c | | tbl_d | | tbl_e | | tbl_f | | tbl_g | | tbl_h | | tbl_i | | tbl_j | | tbl_k | | tbl_l | | tbl_m | | tbl_n | | tbl_o | | tbl_p | | tbl_q | | tbl_r | | tbl_s | | tbl_t |
Some closing thoughts
Grep, Awk and Sed are your friend! Mileage may vary; however, this is still
a good and quick process to gain a lot of perspective on your index usage.
PS. Make sure you set your long_wait_timeout back to whatever you had
before. Leaving it at zero would be bad!