In Stream reduce method, must the identity always be 0 for sum and 1 for multiplication?

The identity value is a value, such that x op identity = x. This is a concept which is not unique to Java Streams, see for example on Wikipedia.

It lists some examples of identity elements, some of them can be directly expressed in Java code, e.g.

  • reduce("", String::concat)
  • reduce(true, (a,b) -> a&&b)
  • reduce(false, (a,b) -> a||b)
  • reduce(Collections.emptySet(), (a,b)->{ Set<X> s=new HashSet<>(a); s.addAll(b); return s; })
  • reduce(Double.POSITIVE_INFINITY, Math::min)
  • reduce(Double.NEGATIVE_INFINITY, Math::max)

It should be clear that the expression x + y == x for arbitrary x can only be fulfilled when y==0, thus 0 is the identity element for the addition. Similarly, 1 is the identity element for the multiplication.

More complex examples are

  • Reducing a stream of predicates

    reduce(x->true, Predicate::and)
    reduce(x->false, Predicate::or)
    
  • Reducing a stream of functions

    reduce(Function.identity(), Function::andThen)
    

The @holger answer greatly explain what is the identity for different function but doesn't explain why we need identity and why you have different results between parallel and sequential streams.

Your problem can be reduced to summing a list of element knowing how to sum 2 elements.

So let's take a list L = {12,32,10,18} and a summing function (a,b) -> a + b

Like you learn at school you will do:

(12,32) -> 12 + 32 -> 44
(44,10) -> 44 + 10 -> 54
(54,18) -> 54 + 18 -> 72

Now imagine our list become L = {12}, how to sum this list? Here the identity (x op identity = x) comes.

(0,12) -> 12

So now you can understand why you get +1 to your sum if you put 1 instead of 0, that's because you initialize with a wrong value.

(1,12) -> 1 + 12 -> 13
(13,32) -> 13 + 32 -> 45
(45,10) -> 45 + 10 -> 55
(55,18) -> 55 + 18 -> 73

So now, how can we improve speed? Parallelize things

What if we can split our list and give those splitted list to 4 different thread (assuming 4-core cpu) and then combined it? This will give us L1 = {12}, L2 = {32}, L3 = {10}, L4 = {18}

So with identity = 1

  • thread1: (1,12) -> 1+12 -> 13
  • thread2: (1,32) -> 1+32 -> 33
  • thread3: (1,10) -> 1+10 -> 11
  • thread4: (1,18) -> 1+18 -> 19

and then combine, 13 + 33 + 11 +19, which is equal to 76, this explain why the error is propagated 4 times.

In this case parallel can be less efficient.

But this result depends on your machine and input list. Java won't create 1000 threads for 1000 elements and the error will propagate more slowly as the input grows.

Try running this code summing one thousand 1s, the result is quite close to 1000

public class StreamReduce {

public static void main(String[] args) {
        int sum = IntStream.range(0, 1000).map(i -> 1).parallel().reduce(1, (r, e) -> r + e);
        System.out.println("sum: " + sum);
    }
}

Now you should understand why you have different results between parallel and sequential if you break the identity contract.

See Oracle doc for proper way to write your sum


What's the identity of a problem?


Yes, you are breaking the contract of the combiner function. The identity, which is the first element of reduce, must satisfy combiner(identity, u) == u. Quoting the Javadoc of Stream.reduce:

The identity value must be an identity for the combiner function. This means that for all u, combiner(identity, u) is equal to u.

However, your combiner function performs an addition and 1 is not the identity element for addition; 0 is.

  • Change the identity used to 0 and you will have no surprise: the result will be 72 for the two options.

  • For your own amusement, change your combiner function to perform a multiplication (keeping the identity to 1) and you will also notice the same result for both options.

Let's build an example where the identity is neither 0 or 1. Given your own domain class, consider:

System.out.println(Person.getPersons().stream()
                    .reduce("", 
                            (acc, p) -> acc.length() > p.name.length() ? acc : p.name,
                            (n1, n2) -> n1.length() > n2.length() ? n1 : n2));

This will reduce the stream of Person to the longest person name.