What pagination schemes can handle rapidly-changing content lists?

Oracle handles this nicely. As long as a cursor is open, you can fetch as many times as necessary and your results will always reflect the point in time at which the cursor was opened. It uses data from the undo logs to virtually rollback changes that were committed after the cursor was opened.

It will work as long as the required rollback data is still available. Eventually the logs get recycled and the rollback data is no longer available, so there is some limit, depending on the log space, system activity, etc.

Unfortunately (IMO), I don't know of any other DB that works like this. The other databases I've worked with use locks to ensure read-consistency, which is problematic if you want read consistency over more than very short duration.

We're going with the server-side state approach for now, caching the entire result on the first query so we always return a consistent list. This will work as long as our query already returns all rows; eventually we'll need to use a nearest-neighbor approach and that wont work.

But I think there's a fourth possibility, which scales very well, as long as:

You don't need a guarantee of no duplicates, only a high likelihood
You're okay with missing some content during scrolls, as long as you avoid duplicates

The solution is a variant of the "last seen ID" solution: Have the client keep not one, but 5 or 10 or 20 bookmarks - few enough that you can store them efficiently. The query ends up looking like:

SELECT * FROM posts
WHERE id > :bookmark_1
AND id > :bookmark_2
...
ORDER BY id

As the number of bookmarks grows, the odds rapidly diminish that you are (a) starting at some point past all n bookmarks but (b) seeing duplicate content anyway because they were all reranked.

If there are holes, or better answers in the future, I'll happily unaccept this answer.

Solution 1: "the hacky solution"

A solution could consist in your client keeping track of the already seen content, a list of IDs for example. Each time you need another page, you add this ID list to the parameters of your server call. Your server can then order the content, remove already seen content and apply the offset to get the right page.

I would not recommend it though and I insist on hacky. I just write it down here because it's quick and could fit with some needs. here are the bad things I can think of:

1) It needs some work on client side to get it right (what does "already seen" means in my sentence above, what if I go to a previous page?)

2) The resulting order doesn't reflect your true ordering policy. A content could be displayed in page 2 although the policy should have put it on page 1. It could lead to a user misunderstanding. Let's take the example of stack overflow with its former ordering policy, that means most upvoted answers first. We could have a question with 6 upvotes being in page 2 while a question with 4 upvotes would be in page 1. This happen when the 2 or more upvotes occurred while user was still on page 1. --> can be surprising for the user.

Solution 2: "the client solution"

It's basically the client-side equivalent solution to the one you call "server-side state". It's then useful only if keeping track of the full order on server side is not convenient enough. It works if the items list is not infinite.

Call your server to get the full (finite) order list + the number of items/page
Save it on client side
Retrieve items directly through the ids of your content.

What pagination schemes can handle rapidly-changing content lists?

Tags:

Database

Pagination

Complex Event Processing

Related

Recent Posts