Pagination is more than just a technical detail—it's a key design decision that impacts both user experience and system performance. As data and traffic scale, a robust pagination scheme ensures that your API can handle increasing demands while providing a seamless experience to users. In this post, we'll explore the fundamentals of pagination design, compare common pagination strategies, and discuss best practices for handling edge cases like record creation and deletion.
At its core, pagination is about presenting users with a subset of data in an ordered and digestible format, typically in the form of "pages." For pagination to be effective, the dataset must be sorted into a fixed order. This is crucial for providing a stable user experience, where the contents of each page remain consistent unless the underlying data changes in a significant way.
A good pagination key should have high cardinality (meaning few duplicates) and minimal nulls to ensure that the order is stable. When the sort key isn't unique, adding a tiebreaker, such as a unique record ID, can help maintain a consistent order.
Designing an effective pagination scheme requires balancing several factors.
Offset-based pagination is the simplest and most widely supported method. It works by specifying a `limit` (the number of items per page) and an `offset` (the index of the first item on the page).
A resource like this:
/books?limit=100&offset=20
Might map to a SQL query like this:
SELECT * FROM book ORDER BY published ASC LIMIT 100 OFFSET 20
Offset-based pagination is well-suited for applications with relatively small datasets and predictable traffic patterns. It also supports UI interactions like infinite scrolling and traditional page navigation.
Cursor-based pagination uses an opaque cursor rather than an explicit offset to navigate through the data. The cursor typically contains the key value(s) from which to start the next page, and it can be as simple or complex as needed.
A resource like this:
/books?limit=100&cursor=eyJwdWJsaXNoZWRhdCI6IjIwMjQtMDgtMjZUMTQ6MDE6MjNaIn0=
Might map to a SQL query like this:
SELECT * FROM book WHERE publishedat > '2024-08-26T14:01:23Z' ORDER BY publishedat ASC LIMIT 100
In this example, the cursor is simply the base64-encoded version of the following JSON:
{ "publishedat": "2024-08-26T14:01:23Z" }
Cursor-based pagination shines in applications with large datasets or distributed systems, where efficiency and scalability are paramount. It supports UI patterns like infinite scrolling and next/previous navigation but is less suited to scenarios requiring direct page access.
According to the HTTP spec, the DELETE method doesn’t guarantee that a resource is physically removed—only that its association with the URL is severed. This allows you to mark records as “deleted” without actually removing them from the database, preserving their position in the pagination order. While this approach prevents items from shifting unnecessarily between pages, it can result in pages with fewer items than expected. It’s also important to avoid returning an empty page followed by more data, as this can confuse users.
When records are ordered by creation timestamp, new records can disrupt pagination by shifting the position of existing items. To mitigate this, consider using a "point in time" parameter in the pagination request. This parameter ensures that users see the dataset as it existed at a specific moment, ignoring records created after that point.
Randomized or haphazard sorts can be valuable for data sampling. A common approach is to sort by a cryptographic hash of a unique ID, which can be computed on the fly for simplicity or stored as a computed field for efficiency. This method ensures stable sampling, reducing the chance of users “random mining” for additional data while effectively minimizing bias.
Choosing the right pagination scheme depends on your application’s specific needs, including data size, traffic patterns, and user experience. While offset-based pagination is simple and effective for many cases, cursor-based pagination offers greater performance and flexibility, especially for larger or distributed datasets. By carefully weighing the pros and cons of each approach and addressing edge cases like record deletion and creation, you can design a robust pagination system that scales gracefully and provides a smooth user experience.