Simple Automation - Examining your MySQL Index Usage

In the majority of MySQL systems, the schema is always updated and changed based on the needs and focus of the application, making it somewhat difficult to keep up with all of the changes that come down the pipe. Chris Schneider shows you how to gauge the index usage in a system, using grep, awk and sed.

Knowing how to index a schema based on the queries using the system is a
great skill to have. In the majority of the systems I have worked on, the
schema is always updated and changed based on the needs and focus of the
application. That said, it can be somewhat difficult to keep up with all of the
changes that come down the pipe, especially if there are many changes every
week. To help identify the indexes that are being used there are a few tools
out there to help. One of the best resides in the MAAKIT called, mk-index-usage.

Unfortunately, I was working with an older version of the Maatkit and didn’t
realize that this tool had already been created then added to the already
stellar Maatkit. In this article, I will go through the process that I used to
gauge the index usage in a system I stepped into in my current position. I
thought this might be a good way to show that using grep, awk and sed can be
very helpful with just about anything.

The Basic Process

1. Set your long_query_time to zero (0) on the MySQL instance you want to
analyze

2. Let the file grow as long as you think is needed for a good analysis

3. Parse the file and run explain on the queries in the file

4. Record the usage of the indexes and some other statistics

5. Present to your team

A Bit More Detail

(root@localhost) [(none)]> set global long_query_time = 0; 
Query OK, 0 rows affected (0.00 sec)

I waited about 10 – 15 minutes or so to ensure that I had a good sample to
work with. Keep in mind that all I really needed were sample queries to gauge
if my indexes were in line with expectations. Basically, I want to take the
MySQL slow query log and turn it into just queries, leaving all the extra stuff
out. It’s really easy to parse the slow query log given its initial output; here
is an example of the MySQL Slow query log.

Example:

# Time: 100921 3:23:00
# User@Host: SOMEMYSQLUSER[SOMEMYSQLUSER] @ [127.0.0.1]
# Query_time: 0.000280 Lock_time: 0.000128 Rows_sent: 1 Rows_examined: 0
SET timestamp=1285039380;
SELECT COUNT(*) count FROM INFORMATION_SCHEMA.INNODB_TRX WHERE trx_state = 'RUNNING';
# User@Host: SOMEMYSQLUSER[SOMEMYSQLUSER] @ [127.0.0.1]
# Query_time: 0.000101 Lock_time: 0.000058 Rows_sent: 1 Rows_examined: 0
SET timestamp=1285039380;
SELECT COUNT(*) count FROM INFORMATION_SCHEMA.INNODB_TRX WHERE trx_state = 'BLOCKED';
# User@Host: SOMEMYSQLUSER[SOMEMYSQLUSER] @ [127.0.0.1]
# Query_time: 0.000084 Lock_time: 0.000042 Rows_sent: 1 Rows_examined: 5
SET timestamp=1285039380;
SELECT SUM(compress_time) compress_seconds, SUM(uncompress_time) uncompress_seconds FROM INFORMATION_SCHEMA.INNODB_CMP;
# User@Host: SOMEMYSQLUSER[SOMEMYSQLUSER] @ [127.0.0.1]

Only 35 of the 55 tables were used during the
run.

The first step in parsing the file is to get rid of the lines that don’t
matter in this case. These lines start with #.

shell> cat /path/to/slow.log |grep -v "^#" > /path/your/working/in/slow1.log

You now need to take out more information that doesn’t matter. Just as an
example I have put in a few of the strings I really don’t want to run with
EXPLAIN or couldn’t run with EXPLAIN. A short list would be commands like
commit, rollback, insert, update, delete, replace…etc. Please note that
this will vary from system to system so you should take this as a general
guideline.

Shell> cat slow1.log | egrep -v "^commit|^SET|^SELECT @@session|^use|^select FOUND_ROWS|^rollback|^update|^delete|^replace|mysql-connector-|^SHOW" > slow2.log

Next you’ll need to remove all of the tabs from the beginning of the line,
remove the new lines from the end of the lines then replace the left over
semi-colon with a new line. This will give you one line per SELECT statement.

Shell> cat slow2.log |sed -e 's/^[ t]*/ /g' |tr 'n' ' ' | sed -e 's/;/n/g' > slow3.log

The last command is to add an EXPLAIN to the beginning of each line.

Shell> sed -i -e 's/^SELECT/EXPLAIN SELECT/g' slow3.log

Now that I have a file with the information that I need, I would like to
gather some information. All of the queries need to be run through EXPLAIN
inside of MySQL so we can identify the index the query is using. Below is a
simple script that will execute each line of the slow3.log we just created, run
explain and log the index used for each table in a file.

#!/bin/sh
myfile="slow3.log"
dir="indexes"
database="mytestdb"
mkdir -p ${dir}
while read line
do
table=`mysql -uroot ${database} -e "${line}G" |grep "table:" |awk '{print $2}'`
key=`mysql -uroot ${database} -e "${line}G" |grep "key:" |awk '{print $2}'`
iffile=`ls -al ${dir}/${table}.txt |wc -l`
if [ ${iffile} -eq 0 ]; then
touch ${dir}/${table}.txt
fi
echo ${key} >> ${dir}/${table}.txt
done < ${myfile}

After we have all of the files in the ./indexes directory we can run a few
commands that tell us a lot of information. Below, I want to know the overall
table usage and the index usage based on the table specifically.

Table Usage

Every time a table is accessed with a SELECT from the slow3.log it will add
a line to the corresponding tbl_X.txt file. Based on the file size alone it is
easy to tell what tables are being accessed the most.

NOTE: The higher the number the more the table
is used.

shell> ls -al |awk '{print $5" "$9}' |sort -n -r

248147 tbl_1.txt
219515 tbl_2.txt
160510 tbl_3.txt
51804 tbl_4.txt
33454 tbl_5.txt
27938 tbl_6.txt
26753 tbl_7.txt
13090 tbl_8.txt
11420 tbl_9.txt
11324 tbl_10.txt
7588 tbl_11.txt
4873 tbl_12.txt
4823 tbl_13.txt
3566 tbl_14.txt
1522 tbl_15.txt
1517 tbl_16.txt
743 tbl_17.txt
611 tbl_18.txt
531 tbl_19.txt
208 tbl_20.txt
169 tbl_21.txt
130 tbl_22.txt
117 tbl_23.txt
104 tbl_24.txt
98 tbl_25.txt
97 tbl_26.txt
91 tbl_27.txt
74 tbl_28.txt
65 tbl_29.txt
65 tbl_30.txt
65 tbl_31.txt
39 tbl_32.txt
26 tbl_33.txt
13 tbl_34.txt

Index usage

Next, I would like to see the indexes that were use and the frequency in
which they were used. Below is a simple script that will run counts on each
file and display the output in a readable fashion.

NOTE: the number next to the index is how many
times that index was used during the run.

shell> for x in `ls -al *.txt |awk '{print $9}' |awk -F. '{print $1}'`; 
	do echo "TABLE: ${x}"; for y in `cat ${x}.txt | sort | uniq`; 
	do cnt=`grep ${y} ${x}.txt |wc -l`; echo "${y}::${cnt}"; done; echo; done

TABLE: tbl_8
_idx__author::895
_idx__my_slug::24
new::201
new2::155
PRIMARY::20

TABLE: tbl_11
_idx__author::5
new2::1503
PRIMARY::1

TABLE: tbl_7
_idx__author::1951
_idx__my_approved::8
_idx__my_slug::34
new3::8
new4::14
new5::42
new7::74
PRIMARY::10

TABLE: tbl_14
_idx__my_slug::245
PRIMARY::17

TABLE: tbl_2
_idx__author::877
_idx__author,_idx__my_attached_to::61
_idx__my_attached_to::8030
_idx__my_attached_to_approved::1731
_idx__my_attached_to_by_date::1312
new2::1001
new3::1636
PRIMARY::28

TABLE: tbl_17
_idx__author::43
PRIMARY::23

TABLE: tbl_33
_idx__author::2

TABLE: tbl_27
_idx__author::7

TABLE: tbl_10
_idx__author::7
_idx__mytbl_19_id::11
new2::2207

TABLE: tbl_34
_idx__author::1

TABLE: tbl_19
_idx__author::29
_idx__my_approved::6
_idx__my_slug::3
new::1

TABLE: tbl_13
_idx__author::371

TABLE: tbl_23
_idx__author::9

TABLE: tbl_3
_idx__author::5
_idx__mytbl_9_id::12
_idx__my_mozzle::73
_idx__mytbl_1name__lower::5489
_idx__mytbl_1name__lower_tbl_9_privacy::1380
new::199
new2::68
PRIMARY::34

TABLE: tbl_9
_idx__author::708
_idx__my_approved::10
new2::55
new3::293
PRIMARY::37

TABLE: tbl_31
_idx__author::5

TABLE: tbl_15
_idx__author::34
new::248
PRIMARY::11

TABLE: tbl_25
_idx__author::5
NULL::5
PRIMARY::1

TABLE: tbl_16
_idx__author::113
PRIMARY::6

TABLE: tbl_21
_idx__author::13

TABLE: tbl_18
_idx__author::5
_idx__my_slug::39

TABLE: tbl_26
_idx__author::5
PRIMARY::4

TABLE: tbl_6
_idx__author::996
_idx__my_approved::12
_idx__my_slug::168
new::2458
new3::73
new4::483
new8::1802
PRIMARY::29

TABLE: tbl_32
_idx__author::3

TABLE: tbl_22
_idx__author::10

TABLE: tbl_30
_idx__author::5

TABLE: tbl_20
_idx__author::16

TABLE: tbl_12
_idx__author::373
PRIMARY::3


TABLE: tbl_4
_idx__author::2667
_idx__mytbl_14_id::113
_idx__my_slug::714
new2::31
new3::13
new4::651
new5::213
PRIMARY::28

TABLE: tbl_24
_idx__author::8

TABLE: tbl_1
_idx__author::6620
_idx__id::171
_idx__my_birth_month_and_day::30
_idx__my_full_name::16
_idx__my_profile_address__lower::4489
_idx__my_xg__forum__activity__count::3
_idx__updated_date::13
new6::1264
new7::70
new_fullname_status_idx::335
newtbl_1_count_idx::7
PRIMARY::66

TABLE: tbl_29
_idx__author::5

TABLE: tbl_28
_idx__author::2
PRIMARY::6

TABLE: tbl_5
_idx__author::2342
_idx__my_approved::6
_idx__my_slug::40
new::450
new2::60
new3::376
PRIMARY::13

More About Tables

I also want to know the tables that I’m not using. It is very simple to compare
the table names in the files we created above to the information_schema
database and find the tables that were not used. Below is the output of the
table that I was not using during the run of the log file.

| tbl_a |
| tbl_b |
| tbl_c |
| tbl_d |
| tbl_e |
| tbl_f |
| tbl_g |
| tbl_h |
| tbl_i |
| tbl_j |
| tbl_k |
| tbl_l |
| tbl_m |
| tbl_n |
| tbl_o |
| tbl_p |
| tbl_q |
| tbl_r |
| tbl_s |
| tbl_t |

Some closing thoughts

Grep, Awk and Sed are your friend! Mileage may vary; however, this is still
a good and quick process to gain a lot of perspective on your index usage.

PS. Make sure you set your long_wait_timeout back to whatever you had
before. Leaving it at zero would be bad!

»

See All Articles by Columnist
Chris Schneider