Playing around with maatkit

I was doing some MySQL query performance tuning on DGM Live today using maatkit. Maatkit is a toolkit for measuring, what else, MySQL performance. The two tools that I played around with specifically were the slow query log parser and the query profiler.

I don’t use perl a whole lot, so the first thing I had to do was install perl’s DBI module.

I jumped into the cpan prompt:

shell> cpan

And then installed the module:

cpan> install DBI

And then force installed DBD::mysql. I had to use force install so that when the tests to connect to my local MySQL server failed, it didn’t rollback the install. I know that the MySQL server is working and can be reached from my local box, so I don’t care about the tests failing:

cpan> force install DBD::mysql

I then exit out of the cpan shell and the maatkit scripts are now part of my executable path. First, I ran the mk-query-profiler script over a couple of sluggish queries that were stored in a .sql file. NOTE: It’s important that each query in the SQL file not only have a semi-colon at the end of the statement, but if you’re analyzing multiple queries then there as to be a full line of white space between the queries, not just a line break.

shell> mk-query-profiler --user backend --askpass --database dgmms ~/queries.sql

The output shows an overview of the information provided by the show status call provided by MySQL.

+----------------------------------------------------------+
|                      2 (3.1201 sec)                      |
+----------------------------------------------------------+

__ Overall stats _______________________ Value _____________
   Total elapsed time                        6.231
   Questions                                 2
     COMMIT                                  0
     DELETE                                  0
     DELETE MULTI                            0
     INSERT                                  0
     INSERT SELECT                           0
     REPLACE                                 0
     REPLACE SELECT                          0
     SELECT                                  2
     UPDATE                                  0
     UPDATE MULTI                            0
   Data into server                        944
   Data out of server                   169031
   Optimizer cost                        10919.360

__ Table and index accesses ____________ Value _____________
   Table locks acquired                      6
   Table scans                               0
     Join                                    0
   Index range scans                         0
     Join without check                      0
     Join with check                         0
   Rows sorted                             137
     Range sorts                             0
     Merge passes                            0
     Table scans                             2
     Potential filesorts                     2

The key value that’s helpful to see here, either as a bulk value for all queries or on a per-query basis, is the optimizer cost. I want to get this number as low as possible. I can add a -s flag to my command to break the results out by query.

+----------------------------------------------------------+
|                   QUERY 1 (3.0744 sec)                   |
+----------------------------------------------------------+
SELECT COUNT...

__ Overall stats _______________________ Value _____________
   Elapsed time                              3.074
   Data into server                        478
   Data out of server                    11224
   Optimizer cost                         5459.680

__ Table and index accesses ____________ Value _____________
   Table locks acquired                      3
   Table scans                               0
     Join                                    0
   Index range scans                         0
     Join without check                      0
     Join with check                         0
   Rows sorted                              10
     Range sorts                             0
     Merge passes                            0
     Table scans                             1
     Potential filesorts                     1

+----------------------------------------------------------+
|                   QUERY 2 (3.0618 sec)                   |
+----------------------------------------------------------+
SELECT COUNT...

__ Overall stats _______________________ Value _____________
   Elapsed time                              3.062
   Data into server                        466
   Data out of server                   157807
   Optimizer cost                         5459.680

__ Table and index accesses ____________ Value _____________
   Table locks acquired                      3
   Table scans                               0
     Join                                    0
   Index range scans                         0
     Join without check                      0
     Join with check                         0
   Rows sorted                             127
     Range sorts                             0
     Merge passes                            0
     Table scans                             1
     Potential filesorts                     1

+----------------------------------------------------------+
|                   QUERY 2 (3.0618 sec)                   |
+----------------------------------------------------------+

__ Overall stats _______________________ Value _____________
   Total elapsed time                        6.136
   Questions                                 2
     COMMIT                                  0
     DELETE                                  0
     DELETE MULTI                            0
     INSERT                                  0
     INSERT SELECT                           0
     REPLACE                                 0
     REPLACE SELECT                          0
     SELECT                                  2
     UPDATE                                  0
     UPDATE MULTI                            0
   Data into server                        944
   Data out of server                   169031
   Optimizer cost                        10919.360

__ Table and index accesses ____________ Value _____________
   Table locks acquired                      6
   Table scans                               0
     Join                                    0
   Index range scans                         0
     Join without check                      0
     Join with check                         0
   Rows sorted                             137
     Range sorts                             0
     Merge passes                            0
     Table scans                             2
     Potential filesorts                     2

The difference between these two queries is that one of them has a LIMIT clause to reduce the result set size, which is the reason for the difference in teh size of the ‘Data out of server’, but it doesn’t make any difference to the optimization cost.

This was a difficult query to tune because the data was joined across three tables that all had a high cardinality, and the row traversals were minimal. I determined that a GROUP BY clause on the largest of the tables was causing a lot of CPU consumption which was slowing the query down. In the end, I decided to break the larger complex query up into two simpler queries. The total execution time was cut down to almost a tenth of the original query execution time, and resulted in less work for the processor overall.

The query profiler didn’t lead me to that conclusion right away, but it provided a very useful tool to quickly try different approaches and test them for performance.

The query profiler can do more than just take a file or standard input of queries. It can also analyze the performance of queries run against the server, by aggregating the show status information for all the queries. You simply use the –external flag instead of passing in a file and press ENTER when you’re done profiling the server.

This is very handy for getting a quick sense of the database overhead on a single page or action in your web application. I used ab, the apache benchmarking utility, to load test the server and ran the query profiler against the load test to get a better picture of how many queries are run and their performance when the server is under heavy load.

The query profiler is only one piece of the maatkit toolkit, and I look forward to playing with the rest of the available scripts to further tune the database layer of my applications.

One response to “Playing around with maatkit

  1. Pingback: Rising Spiral » Blog Archive » I Had a Good Thing Going

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s