Why would you prefer Java 8 Stream API instead of direct hibernate/sql queries when working with the DB

If the data originally comes from a DB it is better to do the filtering in the DB rather than fetching everything and filtering locally.

First, Database management systems are good at filtering, it is part of their main job and they are therefore optimized for it. The filtering can also be sped up by using indexes.

Second, fetching and transmitting many records and to unmarshal the data into objects just to throw away a lot of them when doing local filtering is a waste of bandwidth and computing resources.


On a first glance: streams can be made to run in parallel; just by changing code to use parallelStream(). (disclaimer: of course it depends on the specific context if just changing the stream type will result in correct results; but yes, it can be that easy).

Then: streams "invite" to use lambda expressions. And those in turn lead to usage of invoke_dynamic bytecode instructions; sometimes gaining performance advantages compared to "old-school" kind of writing such code. (and to clarify the misunderstanding: invoke_dynamic is a property of lambdas, not streams!)

These would be reasons to prefer "stream" solutions nowadays (from a general point of view).

Beyond that: it really depends ... lets have a look at your example input. This looks like dealing with ordinary Java POJOs, that already reside in memory, within some sort of collection. Processing such objects in memory directly would definitely be faster than going to some off-process database to do work there!

But, of course: when the above calls, like book.getAuthor() would be doing a "deep dive" and actually talk to an underlying database; then chances are that "doing the whole thing in a single query" gives you better performance.


The first thing is to realize, that you can't tell from just this code, what statement is issued against the database. It might very well, that all the filtering, limiting and mapping is collected, and upon the invocation of collect all that information is used to construct a matching SQL statement (or whatever query language is used) and send to the database.

With this in mind there are many reasons why streamlike APIs are used.

  1. It is hip. Streams and lambdas are still rather new to most java developers, so they feel cool when they use it.

  2. If something like in the first paragraph is used it actually creates a nice DSL to construct your query statements. Scalas Slick and .Net LINQ where early examples I know about, although I assume somebody build something like it in LISP long before I was born.

  3. The streams might be reactive streams and encapsulate a non-blocking API. While these APIs are really nice because they don't force you to block resources like threads while you are waiting for results. Using them requires either tons of callbacks or using a much nicer stream based API to process the results.

  4. They are nicer to read the imperative code. Maybe the processing done in the stream can't [easily/by the author] be done with SQL. So the alternatives aren't SQL vs Java (or what ever language you are using), but imperative Java or "functional" Java. The later often reads nicer.

So there are good reasons to use such an API.

With all that said: It is, in almost all cases, a bad idea to do any sorting/filtering and the like in your application, when you can offload it to the database. The only exception I can currently think of is when you can skip the whole roundtrip to the database, because you already have the result locally (e.g. in a cache).