I was doing some MySQL query performance tuning on DGM Live today using maatkit. Maatkit is a toolkit for measuring, what else, MySQL performance. The two tools that I played around with specifically were the slow query log parser and the query profiler.
I don’t use perl a whole lot, so the first thing I had to do was install perl’s DBI module.
I jumped into the cpan prompt:
shell> cpan
And then installed the module:
cpan> install DBI
And then force installed DBD::mysql. I had to use force install so that when the tests to connect to my local MySQL server failed, it didn’t rollback the install. I know that the MySQL server is working and can be reached from my local box, so I don’t care about the tests failing:
cpan> force install DBD::mysql
I then exit out of the cpan shell and the maatkit scripts are now part of my executable path. First, I ran the mk-query-profiler script over a couple of sluggish queries that were stored in a .sql file. NOTE: It’s important that each query in the SQL file not only have a semi-colon at the end of the statement, but if you’re analyzing multiple queries then there as to be a full line of white space between the queries, not just a line break.
shell> mk-query-profiler --user backend --askpass --database dgmms ~/queries.sql
The output shows an overview of the information provided by the show status call provided by MySQL.
+----------------------------------------------------------+
| 2 (3.1201 sec) |
+----------------------------------------------------------+
__ Overall stats _______________________ Value _____________
Total elapsed time 6.231
Questions 2
COMMIT 0
DELETE 0
DELETE MULTI 0
INSERT 0
INSERT SELECT 0
REPLACE 0
REPLACE SELECT 0
SELECT 2
UPDATE 0
UPDATE MULTI 0
Data into server 944
Data out of server 169031
Optimizer cost 10919.360
__ Table and index accesses ____________ Value _____________
Table locks acquired 6
Table scans 0
Join 0
Index range scans 0
Join without check 0
Join with check 0
Rows sorted 137
Range sorts 0
Merge passes 0
Table scans 2
Potential filesorts 2
The key value that’s helpful to see here, either as a bulk value for all queries or on a per-query basis, is the optimizer cost. I want to get this number as low as possible. I can add a -s flag to my command to break the results out by query.
+----------------------------------------------------------+
| QUERY 1 (3.0744 sec) |
+----------------------------------------------------------+
SELECT COUNT...
__ Overall stats _______________________ Value _____________
Elapsed time 3.074
Data into server 478
Data out of server 11224
Optimizer cost 5459.680
__ Table and index accesses ____________ Value _____________
Table locks acquired 3
Table scans 0
Join 0
Index range scans 0
Join without check 0
Join with check 0
Rows sorted 10
Range sorts 0
Merge passes 0
Table scans 1
Potential filesorts 1
+----------------------------------------------------------+
| QUERY 2 (3.0618 sec) |
+----------------------------------------------------------+
SELECT COUNT...
__ Overall stats _______________________ Value _____________
Elapsed time 3.062
Data into server 466
Data out of server 157807
Optimizer cost 5459.680
__ Table and index accesses ____________ Value _____________
Table locks acquired 3
Table scans 0
Join 0
Index range scans 0
Join without check 0
Join with check 0
Rows sorted 127
Range sorts 0
Merge passes 0
Table scans 1
Potential filesorts 1
+----------------------------------------------------------+
| QUERY 2 (3.0618 sec) |
+----------------------------------------------------------+
__ Overall stats _______________________ Value _____________
Total elapsed time 6.136
Questions 2
COMMIT 0
DELETE 0
DELETE MULTI 0
INSERT 0
INSERT SELECT 0
REPLACE 0
REPLACE SELECT 0
SELECT 2
UPDATE 0
UPDATE MULTI 0
Data into server 944
Data out of server 169031
Optimizer cost 10919.360
__ Table and index accesses ____________ Value _____________
Table locks acquired 6
Table scans 0
Join 0
Index range scans 0
Join without check 0
Join with check 0
Rows sorted 137
Range sorts 0
Merge passes 0
Table scans 2
Potential filesorts 2
The difference between these two queries is that one of them has a LIMIT clause to reduce the result set size, which is the reason for the difference in teh size of the ‘Data out of server’, but it doesn’t make any difference to the optimization cost.
This was a difficult query to tune because the data was joined across three tables that all had a high cardinality, and the row traversals were minimal. I determined that a GROUP BY clause on the largest of the tables was causing a lot of CPU consumption which was slowing the query down. In the end, I decided to break the larger complex query up into two simpler queries. The total execution time was cut down to almost a tenth of the original query execution time, and resulted in less work for the processor overall.
The query profiler didn’t lead me to that conclusion right away, but it provided a very useful tool to quickly try different approaches and test them for performance.
The query profiler can do more than just take a file or standard input of queries. It can also analyze the performance of queries run against the server, by aggregating the show status information for all the queries. You simply use the –external flag instead of passing in a file and press ENTER when you’re done profiling the server.
This is very handy for getting a quick sense of the database overhead on a single page or action in your web application. I used ab, the apache benchmarking utility, to load test the server and ran the query profiler against the load test to get a better picture of how many queries are run and their performance when the server is under heavy load.
The query profiler is only one piece of the maatkit toolkit, and I look forward to playing with the rest of the available scripts to further tune the database layer of my applications.
1 response so far ↓
Rising Spiral » Blog Archive » I Had a Good Thing Going // January 13, 2009 at 6:45 am |
[...] I got some work done on DGM Live. This should be the first day of many where I can report that. Working for Y! got in the way of devoting any serious time to DGM Live over the past year and a half. I’m glad that’s come to an end. Part of today’s work involved MySQL performance tuning. For the technically inclined, I’ve got a Spare Cycles post up about playing around with maatkit. [...]