How to perform pagination in Gremlin

From a functional perspective, a nice looking bit of Gremlin for paging would be:

gremlin> g.V().hasLabel('person').fold().as('persons','count').
               select('persons','count').
                 by(range(local, 0, 2)).
                 by(count(local))
==>[persons:[v[1],v[2]],count:4]
gremlin> g.V().hasLabel('person').fold().as('persons','count').
               select('persons','count').
                 by(range(local, 2, 4)).
                 by(count(local))
==>[persons:[v[4],v[6]],count:4]

In this way you get the total count of vertices with the result. Unfortunately, the fold() forces you to count all the vertices which will require iterating them all (i.e. bringing them all into memory).

There really is no way to avoid iterating all 100,000 vertices in this case as long as you intend to execute your traversal in multiple separate attempts. For example:

gremlin> g.V().hasLabel('person').range(0,2)
==>v[1]
==>v[2]
gremlin> g.V().hasLabel('person').range(2,4)
==>v[4]
==>v[6]

The first statement is the same as if you'd terminated the traversal with limit(2). On the second traversal, that only wants the second two vertices, it not as though you magically skip iterating the first two as it is a new traversal. I'm not aware of any TinkerPop graph database implementation that will do that efficiently - they all have that behavior.

The only way to do ten vertices at a time without having them all in memory is to use the same Traversal instance as in:

gremlin> t = g.V().hasLabel('person');[]
gremlin> t.next(2)
==>v[1]
==>v[2]
gremlin> t.next(2)
==>v[4]
==>v[6]

With that model you only iterate the vertices once and don't bring them all into memory at a single point in time.

Some other thoughts on this topic can be found in this blog post.


Why not add order().by() and perform range() function on your gremlin query.