Optimize query with OFFSET on large table
OFFSET is always going to be slow. Postgres has to order all rows and count the visible ones up to your offset. To skip all previous rows directly you could add an indexed
row_number to the table (or create a
MATERIALIZED VIEW including said
row_number) and work with
WHERE row_number > x instead of
However, this approach is only sensible for read-only (or mostly) data. Implementing the same for table data that can change concurrently is more challenging. You need to start by defining desired behavior exactly.
I suggest a different approach for pagination:
SELECT * FROM big_table WHERE (vote, id) > (vote_x, id_x) -- ROW values ORDER BY vote, id -- needs to be deterministic LIMIT n;
id_x are from the last row of the previous page (for both
ASC). Or from the first if navigating backwards.
Comparing row values is supported by the index you already have - a feature that complies with the ISO SQL standard, but not every RDBMS supports it.
CREATE INDEX vote_order_asc ON big_table (vote, id);
Or for descending order:
SELECT * FROM big_table WHERE (vote, id) < (vote_x, id_x) -- ROW values ORDER BY vote DESC, id DESC LIMIT n;
Can use the same index.
I suggest you declare your columns
NOT NULL or acquaint yourself with the
NULLS FIRST|LAST construct:
- PostgreSQL sort by datetime asc, null first?
Note two things in particular:
ROWvalues in the
WHEREclause cannot be replaced with separated member fields.
WHERE (vote, id) > (vote_x, id_x)cannot be replaced with:
WHERE vote >= vote_x AND id > id_x
That would rule out all rows with
id <= id_x, while we only want to do that for the same vote and not for the next. The correct translation would be:
WHERE (vote = vote_x AND id > id_x) OR vote > vote_x
... which doesn't play along with indexes as nicely, and gets increasingly complicated for more columns.
Would be simple for a single column, obviously. That's the special case I mentioned at the outset.
The technique does not work for mixed directions in
ORDER BY vote ASC, id DESC
At least I can't think of a generic way to implement this as efficiently. If at least one of both columns is a numeric type, you could use a functional index with an inverted value on
(vote, (id * -1))- and use the same expression in
ORDER BY vote ASC, (id * -1) ASC
- SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
- Improve performance for order by with columns from many tables
Note in particular the presentation by Markus Winand I linked to:
- "Pagination done the PostgreSQL way"