Why (offset, limit) is slow in database select?

Starting from a problem

Five years ago when I was working at Tencent, I found that MySQL request speed was very slow in the pagination scenario. With only 100,000 data, a select query on a single machine took about 2-3 seconds. I asked my mentor why, and he asked in return, "In an indexing scenario, what is the time complexity to get the nth largest number in MySQL?"

The pursuit of the answer

Confirming the scenario

Assuming there is an index on the "status" column, a query like "select * from table where status = xx limit 10 offset 10000" will be very slow. Even with a small amount of data, there will be a few seconds of delay.

The answer from a novice

At that time, I felt very secure because I had a mentor who could help me with anything. Since my knowledge was the worst in the team, I guessed that the time complexity would be log(N) and thought that finding a node would take log(N). Naturally, my mentor asked me to study it on my own.

Continuing the answer

Upon careful analysis, it is found that using an index to find the result is awkward. This is because you do not know the distribution of the first 100 numbers in the left and right subtrees, so it is impossible to use the binary search characteristics of a binary tree. Through learning, it is understood that MySQL's index is a B+ tree.

After looking at this diagram, everything became clear. It is possible to directly find the 100th largest number with a complexity of O(n) by using the linked list composed of leaf nodes. However, even with O(n), it should not be so slow that it is unbearable. Are there other reasons for this?

Systematic learning

Here are two recommended books. The first is MySQL Internals: InnoDB Storage Engine which provides a deeper understanding of the implementation mechanism of InnoDB, such as MVCC, index implementation, and file storage.

The second book is High Performance MySQL. This book covers the usage aspect but is quite in-depth and mentions many design ideas.

Combining these two books and repeatedly learning from them, MySQL can barely be mastered.

Here are two key concepts:

Clustered index: includes the primary key index and the corresponding actual data. The leaf node of the index is the data node.
Secondary index: can be understood as a secondary node, and its leaf node is still an index node, which includes the primary key ID.

Even if the first 10,000 items are discarded, MySQL will still search the data in the clustered index through the primary key ID on the secondary index. This is 10,000 random I/O operations, so it is naturally slow like a husky. Here, one may question why this behavior occurs. This is related to the layering of MySQL, where "limit offset" can only be applied to the result set returned by the engine layer. In other words, the engine layer is also innocent; it does not know that these 10,000 items are to be discarded. The following is a diagram of the MySQL layering, which shows that the engine layer and the server layer are actually separate.

Until this point, I probably understood the reason for the slowness. This stage took about one year.

Analogical reasoning

At this point, I had been working for 3 years and had started to look at some source code. After reading etcd, I looked at the source code of TiDB. Regardless of which database, a query statement is actually composed of logical operators.

Introduction to Logical Operators. Before writing specific optimization rules, let's briefly introduce some logical operators in the query plan.

DataSource: This is the data source, which is the table in "select * from t".
Selection: Selection, such as the "where" filter condition in "select xxx from t where xx = 5".
Projection: which is the operation of selecting a column "c" in "select c from t".
Join: which is to join the tables "t1" and "t2" in "select xx from t1, t2 where t1.c = t2.c".

Selection, Projection, and Join(referred to as SPJ) are the most basic operators, and there are various ways to perform the join, such as inner join, left outer join, and right outer join.

After the query select b from t1, t2 where t1.c = t2.c and t1.a > 5 is transformed into a logical query plan, the DataSources corresponding to "t1" and "t2" are responsible for fetching the data. Then, a Join operator is applied to connect the results of the two tables based on "t1.c = t2.c", followed by a Selection operation to filter the results based on "t1.a > 5", and finally a Projection operation to select the "b" column. The following diagram shows the unoptimized representation:

So it's not that MySQL doesn't want to pass limit, offset to the engine layer, but because the logical operators are partitioned, it is impossible to know how many data that meet the condition are included in a specific operator.

How to solve it

High Performance MySQL mentions two solutions:

Solution 1

Based on actual business needs, consider whether to replace with the next page and previous page function, especially on iOS and Android platforms, the traditional paging method is not common. Here, limit and offset can be replaced with the > auxiliary index (i.e., search condition) ID. This ID needs to be returned to the front-end when called.

Solution 2

Confront it directly. Here is a concept to introduce: Index Coverage - When the auxiliary index query only has the ID and the auxiliary index itself, there is no need to go back to the clustered index.

The idea is as follows: select xxx,xxx from in (select id from table where second_index = xxx limit 10 offset 10000) means first finding the unique ID values of the data corresponding to the condition query, and because the primary key is also on the auxiliary index, there is no need to retrieve it from the clustered index on the disk. Then, by using these 10 primary key IDs that have already been limited, query the clustered index. This way, there will only be ten random IOs. Using this solution in cases where paging is really needed can greatly improve performance and usually meet performance requirements.

Reference: 分页场景（limit,offset）为什么会慢