How expensive is a MySQL query?
Database access speed is always the bottle neck of many applications. Many application have large amount of data to search, retrieve and display nowadays. How do we improve the performance of our applications, how do we reduce the cost of database access? Apart from the design of database, the quality of the query is also one important factor to take care. But before that, we need to know how much network traffic a query will consume.
Yunyang,Zhang from Nubee in Singapore did some research about MySQL query cost in his post "How Much Network Traffic Does A MySQL Query Consume?". Let's have a look at what he did.
Have you ever wondered how much network traffic was caused by a query with a specific data set? In this post, I would like to share some research I did for stress testing network. During the whole test, I used mysqlslap to simulate a kv query. it’s like the following SQL:
It has 32 fileds, and every one of them is 36 bytes long, which makes the whole record 1152 bytes.
MySQL can hardly utilize all the cpu and network traffic, so I started four MySQL instances on one machine and four clients remotely. After warm-up period, I captured realtime network traffic with ‘sar -n DEV’:
QPS for each instance are 12823, 12936, 13153 and 13123. Basicly 2370 byte per query.
We all know its original size is just 1152 bytes, so that’s a surprisingly expensive query! Let’s dig a little deeper to see what the hell happens inside. Read chapter ”Client/Server Communication” of “Understanding MySQL Internals” carefully, the knowledge there is still updated (although it’s 6 years old already). Usually ppl are too lazy to read it by themselves, so I summarized it a little.
There are four kinds of packets sent from server to client:
- OK Packet
- Error Packet
- EOF Packet
- Result Set Packets (NOTICE: plural)
Result Set Packets contain field definition sequence of packets and data packets.
After server receives a query, it responses client with result Set packets as following sequence:
- A packet with the body consisting of the standard field-length specifier sequence
- A group of field description packets, one for each field, in the field order of the result set
- EOF packet
- Row data packets, one packet per row
- EOF packet
Before finding out what’s inside every packet, we need to learn an important definition called “Data Field” first. Basicly, every field in MySQL packet is abstracted as ”Data Field”, and the length of “Data Field” is like the following table:
|Length of Actual Data (N Byte)||Length specifier sequence||Length of Data Field|
|N < 251 Byte||1 Byte||(1 + N) Byte|
|N < 65536 Byte||2 Byte||(2 + N) Byte|
|N < 16777216 Byte||3 Byte||(3 + N) Byte|
|N >= 16777216 Byte||8 Byte||(8 + N) Byte|
According to the table we define a function called “length_of_data_filed”:
Here comes the length of packets:
- Standard Field-length Specifier Sequence Packet, 1 byte
- Field Description Packet, 17 + length_of_data_filed(db_name) + length_of_data_filed(table_name) + length_of_data_filed(alias_table_name) + length_of_data_filed(column_name) + length_of_data_filed(alias_column_name) + length_of_data_filed(default_value)
- Row Data Packet, length_of_data_filed(field1) + length_of_data_filed(field2) + … + length_of_data_filed(fieldn)
- EOF Packet, 5 bytes
To verify if the knowledge in ”Understanding MySQL Internals” is updated, I used Wireshark to analyze packets captured by tcpdump on Linux server. The whole analysis is shown as below:
Ethernet II protocol costs 14 byte, IP V4 costs 20 bytes, and TCP costs 20 bytes too
One TCP packet can contain several MySQL packets, but it depends on MySQL packet size. In my kv query case, one TCP packet is big enough to hold many MySQL packets. Usually it costs 4 bytes to delimit MySQL packets inside TCP packet. One byte is used to describe the sequence of MySQL packet, and the rest 3 bytes are used for delimiter.
Standard Field-length Specifier Sequence Packet
Field Description Packet
Row Data Packet
Honestly speaking, it’s very hard to analyze network traffic precisely, because TCP behavior varies greatly along with the change of MySQL packet’s size. However it’s easy if we assume that only one record will be returned for one query, which means kv query. Assuming the length of record is k bytes, no alias name is used and no default vaule is set either, we get the formula of network traffic caused by kv query :
That’s all, ^_^.