Java 8 Streams - collect vs reduce

reduce is a "fold" operation, it applies a binary operator to each element in the stream where the first argument to the operator is the return value of the previous application and the second argument is the current stream element.

collect is an aggregation operation where a "collection" is created and each element is "added" to that collection. Collections in different parts of the stream are then added together.

The document you linked gives the reason for having two different approaches:

If we wanted to take a stream of strings and concatenate them into a single long string, we could achieve this with ordinary reduction:

 String concatenated = strings.reduce("", String::concat)  

We would get the desired result, and it would even work in parallel. However, we might not be happy about the performance! Such an implementation would do a great deal of string copying, and the run time would be O(n^2) in the number of characters. A more performant approach would be to accumulate the results into a StringBuilder, which is a mutable container for accumulating strings. We can use the same technique to parallelize mutable reduction as we do with ordinary reduction.

So the point is that the parallelisation is the same in both cases but in the reduce case we apply the function to the stream elements themselves. In the collect case we apply the function to a mutable container.


The reason is simply that:

  • collect() can only work with mutable result objects.
  • reduce() is designed to work with immutable result objects.

"reduce() with immutable" example

public class Employee {
  private Integer salary;
  public Employee(String aSalary){
    this.salary = new Integer(aSalary);
  }
  public Integer getSalary(){
    return this.salary;
  }
}

@Test
public void testReduceWithImmutable(){
  List<Employee> list = new LinkedList<>();
  list.add(new Employee("1"));
  list.add(new Employee("2"));
  list.add(new Employee("3"));

  Integer sum = list
  .stream()
  .map(Employee::getSalary)
  .reduce(0, (Integer a, Integer b) -> Integer.sum(a, b));

  assertEquals(Integer.valueOf(6), sum);
}

"collect() with mutable" example

E.g. if you would like to manually calculate a sum using collect() it can not work with BigDecimal but only with MutableInt from org.apache.commons.lang.mutable for example. See:

public class Employee {
  private MutableInt salary;
  public Employee(String aSalary){
    this.salary = new MutableInt(aSalary);
  }
  public MutableInt getSalary(){
    return this.salary;
  }
}

@Test
public void testCollectWithMutable(){
  List<Employee> list = new LinkedList<>();
  list.add(new Employee("1"));
  list.add(new Employee("2"));

  MutableInt sum = list.stream().collect(
    MutableInt::new, 
    (MutableInt container, Employee employee) -> 
      container.add(employee.getSalary().intValue())
    , 
    MutableInt::add);
  assertEquals(new MutableInt(3), sum);
}

This works because the accumulator container.add(employee.getSalary().intValue()); is not supposed to return a new object with the result but to change the state of the mutable container of type MutableInt.

If you would like to use BigDecimal instead for the container you could not use the collect() method as container.add(employee.getSalary()); would not change the container because BigDecimal it is immutable. (Apart from this BigDecimal::new would not work as BigDecimal has no empty constructor)