Advantages of Stream and Spring Data

Providing it as a Stream gives the repository consumer the choice on how to collect the data. In addition it allows chaining/piping of operations on the stream, such as mapping to DTOs, augmenting data, and filtering. If the only thing you're ever going to do is collect it to a list and send as a response, then there is no benefit.

But take for example the case where a Thing repository returns a List<Thing> findAllThings() of n Thingss because most of the time it's just sent as a list via the API. But then someone builds a service in the application that needs to filter only Things that exist in another set of m Things in the application. We would have to recreate a list filtering on the set like

List<Thing> acceptedThings = repo.findAllThings()
                                 .stream()
                                 .filter(t->set.contains(t))
                                 .collect(toList());

So we've had to iterate the original list and reconstruct a new list. If there are further operations on this list, you can see how it may be sub-optimal.

If the response from the repository had been Stream<Thing> then we could have chained the filter operation and passed on the Stream for any further processing.

Stream<Thing> acceptedThings = repo.findAllThings()
                                   .filter(t->set.contains(t));

Only right at the end when something consumes the stream will execute all the operations relevant for each item. This is much more efficient as each element only needs to be visited at most once and no intermediate collections need to be created.

Given that Spring now supports returning Streams as @ResponseBody's in controllers, it's even better.


This is already supported in Spring Data JPA, look here; so there's not real advantage to override those to return Stream. If you really want a Stream and some potential advantages that would come with it - use what already Spring Data JPA provides.

And also a different aspect is that in JPA Spec 2.2 this could be the default return type of some queries. The JPA interfaces Query and TypedQuery will get a new method called getResultStream().

So Spring Data will use techniques specific to a particular provider, like Hibernate or EclipseLink to stream the result.

By default getResultStream is just a list.stream implementation, but Hibernate already overrides that with ScrollableResult. This is way more efficient if you need to process a very big result set.


You should see these options as a way to improve your programming model only, from imperative style JDK List to a more functional style stream. You should still push down as much logic into the SQL query to benefit from indexing, better execution plans, etc. If your Stream.filter() is simple, then it can be expressed as a SQL / JPQL WHERE clause.

Please use SQL (or JPQL if it suffices) whenever querying the database. Don't filter in the client if you can avoid it. That would be like buying the entire produce in a super market, throwing everything away, just to get a single yoghurt.