Parallel Infinite Java Streams run out of Memory

OOME is caused not by the stream being infinite, but by the fact that it isn't.

I.e., if you comment out the .limit(...), it will never run out of memory -- but of course, it will never end either.

Once it's split, the stream can only keep track of the number of elements if they're accumulated within each thread (looks like the actual accumulator is Spliterators$ArraySpliterator#array).

Looks like you can reproduce it without flatMap, just run the following with -Xmx128m:

    System.out.println(Stream
            .iterate(1, i -> i + 1)
            .parallel()
      //    .flatMap(n -> Stream.iterate(n, i -> i+n))
            .mapToInt(Integer::intValue)
            .limit(100_000_000)
            .sum()
    );

However, after commenting out the limit(), it should run fine until you decide to spare your laptop.

Besides the actual implementation details, here's what I think is happening:

With limit, the sum reducer wants the first X elements to sum up, so no thread can emit partial sums. Each "slice" (thread) will need to accumulate elements and pass them through. Without limit, there's no such constraint so each "slice" will just compute the partial sum out of the elements it gets (forever), assuming it will emit the result eventually.


You say “but I don't quite know in which order things are evaluated and where buffering occurs”, which is precisely what parallel streams are about. The order of evaluation is unspecified.

A critical aspect of your example is the .limit(100_000_000). This implies that the implementation can’t just sum up arbitrary values, but must sum up the first 100,000,000 numbers. Note that in the reference implementation, .unordered().limit(100_000_000) doesn’t change the outcome, which indicates that there’s no special implementation for the unordered case, but that’s an implementation detail.

Now, when worker threads process the elements, they can’t just sum them up, as they have to know which elements they are allowed to consume, which depends on how many elements are preceding their specific workload. Since this stream doesn’t know the sizes, this can only be known when the prefix elements have been processed, which never happens for infinite streams. So the worker threads keep buffering for the moment, this information becomes available.

In principle, when a worker thread knows that it processes the leftmost¹ work-chunk, it could sum up the elements immediately, count them, and signal the end when reaching the limit. So the Stream could terminate, but this depends on a lot of factors.

In your case, a plausible scenario is that the other worker threads are faster in allocating buffers than the leftmost job is counting. In this scenario, subtle changes to the timing could make the stream occasionally return with a value.

When we slow down all worker threads except the one processing the leftmost chunk, we can make the stream terminate (at least in most runs):

System.out.println(IntStream
    .iterate(1, i -> i+1)
    .parallel()
    .peek(i -> { if(i != 1) LockSupport.parkNanos(1_000_000_000); })
    .flatMap(n -> IntStream.iterate(n, i -> i+n))
    .limit(100_000_000)
    .sum()
);

¹ I’m following a suggestion by Stuart Marks to use left-to-right order when talking about the encounter order rather than the processing order.


My best guess is that adding parallel() changes the internal behavior of flatMap() which already had problems being evaluated lazily before.

The OutOfMemoryError error that you are getting was reported in [JDK-8202307] Getting a java.lang.OutOfMemoryError: Java heap space when calling Stream.iterator().next() on a stream which uses an infinite/very big Stream in flatMap. If you look at the ticket it's more or less the same stack trace that you are getting. The ticket was closed as Won't Fix with following reason:

The iterator() and spliterator() methods are "escape hatches" to be used when it's not possible to use other operations. They have some limitations because they turn what is a push model of the stream implementation into a pull model. Such a transition requires buffering in certain cases, such as when an element is (flat) mapped to two or more elements. It would significantly complicate the stream implementation, likely at the expense of common cases, to support a notion of back-pressure to communicate how many elements to pull through nested layers of element production.