So you have a server storing objects in a relational database, and an API, nowadays probably HTTP but it does not matter. Clients can fetch objects using the API. Obviously you do not want them loading too many objects at the same time, that would cause all kinds of performance issues. So you need a way to let clients request a manageable subset of all objects and iterate to finally obtain all objects. This is pagination.
I used to think it was a trivial problem. But given the number of complex and inefficient solutions I have seen over the years, I was clearly mistaken. Let us find out how to do it easily and efficiently.
Limit and offset
One of the first things you learn with SQL is how to use LIMIT
and OFFSET
to
restrict the number of rows to retrieve.
Let us define a user table and load 10 users after the first 20:
CREATE TABLE users
(id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL);
SELECT * FROM users
LIMIT 10 OFFSET 20;
Have your clients send a limit and offset with each query, increment the offset
to iterate. Easy, but there is a problem: the highest the offset is, the slower
the request will be. LIMIT
and OFFSET
are applied after filtering, so if you
have a million users, selecting the last 10 ones will still require processing
the 999'990 before them. Not great.
Worse, this method can and will provide incorrect results when retrieving pages one after the other: adding rows to the table between each selection will make subsequent pages either miss rows or include some which we already returned in previous pages.
SQL Cursors
It can be tempting to use SQL cursors. After all, they let you consume rows iteratively without loading the whole result set in memory. But they are really not designed for the task: you would need to keep the associated database connection open between requests. Next idea.
Key-based pagination
Sometimes called “keyset” pagination, this method uses a key from the last row of the page you just loaded to obtain the next one efficiently.
Let us load the first page:
SELECT * FROM users
ORDER BY id
LIMIT 10;
Then for the next page, have the client provide the identifier of the last user in the page —let us say 42— and use it to load the second page:
SELECT * FROM users
ORDER BY id
WHERE id > 42
LIMIT 10;
Continue the same way for subsequent pages.
As long as you are filtering on an indexed expression, the database will
efficiently retrieve the rows for the page you requested. And you can use the
exact same method to order results in the other direction, just invert the
WHERE
expression.
You will also avoid the inconsistencies of the LIMIT
/OFFSET
method: if the
table is modified between two page retrieval operations, you may still miss rows
(nothing you can do about it unless you are willing to retrieve the entire table
in one operation), but you will not see rows from previous pages reappear in
subsequent pages. Your users will thank you (or more accurately they will open
less bug reports; even better).
Sorting
So you started using key-based pagination and everything looks fine, but now you need to sort results based on something else than the primary key. Your first idea could be to create an index on the column you want to use to sort, but it will not help: if multiple users have the same name, how will you select your pages?
We could use an index on two columns, name
and id
to obtain the strict total
order we need then filter on these two columns:
CREATE INDEX users_name_id_idx
ON users (name, id);
SELECT * FROM users
WHERE (name, id) > ('bob', 42)
ORDER BY name, id
LIMIT 10;
But this is not standard SQL and would only work with databases supporting row-value comparisons.
Fortunately SQL lets you create indices on expressions and not just columns. So
we create our own single-value keys by concatenating values. Of course we need a
separator to avoid collisions: without it ('Bob', 42)
and ('Bob4', 2)
would
both yield the same 'Bob42'
key). The ASCII character set includes several
separator characters which are perfect for this use case since they are not
supposed to end up in your data (if they do, use another character); let us use
the unit separator 0x1f
:
CREATE INDEX users_name_pagination_idx
ON users ((name || E'\x1f' || id));
Load the first page:
SELECT * FROM users
ORDER BY name || E'\x1f' || id
LIMIT 10;
And assuming the last user of the first page had the name Bob
and the
identifier 42
, Load the second page.
SELECT * FROM users
WHERE name || E'\x1f' || id > E'Bob\x1f42'
ORDER BY name || E'\x1f' || id
LIMIT 10;
Annoying to type but efficient and easy to generate. Of course you will want to make it easy for your clients: when returning a page, compute the key for the previous and next page and return them along the objects. The client can then trivially fetch the previous or next page.
Need to support multiple sort criteria? Just create an index for each of them.
That wasn’t so hard right?