Is there any advantage of calling map after mapToInt, where ever required

When in doubt, test! Using jmh, I get the following results on a list of 100k elements (in microseconds, smaller is better):

Benchmark                        Mode  Samples     Score    Error  Units
c.a.p.SO32462798.for_loop        avgt       10   119.110    0.921  us/op
c.a.p.SO32462798.mapToInt        avgt       10   129.702    1.040  us/op
c.a.p.SO32462798.mapToInt_map    avgt       10   129.753    1.516  us/op
c.a.p.SO32462798.map_reduce      avgt       10  1262.802   12.197  us/op
c.a.p.SO32462798.summingInt      avgt       10   134.821    1.203  us/op

So you have, from faster to slower:

  • for(int i : list) sum += i*i;
  • mapToInt(x -> x * x).sum() and mapToInt(x -> x).map(x -> x * x).sum()
  • collect(Collectors.summingInt(x -> x * x))
  • map(x -> x * x).reduce((x, y) -> x + y).get()

Note that the results are very much dependent on the JIT optimisations. If the logic in the mapping is more complex, some of the optimisations may be unavailable (longer code = less inlining) in which case the streams versions may take 4-5x more time than the for loop - but if that logic is CPU heavy the difference will reduce again. Profiling your actual application will give you more information.


Benchmark code for reference:

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
public class SO32462798 {

  List<Integer> list;

  @Setup public void setup() {
    list = new Random().ints(100_000).boxed().collect(toList());
  }

  @Benchmark public int for_loop() {
    int sum = 0;
    for (int i : list) sum += i * i;
    return sum;
  }

  @Benchmark public int summingInt() {
    return list.stream().collect(Collectors.summingInt(x -> x * x));
  }

  @Benchmark public int mapToInt() {
    return list.stream().mapToInt(x -> x * x).sum();
  }

  @Benchmark public int mapToInt_map() {
    return list.stream().mapToInt(x -> x).map(x -> x * x).sum();
  }

  @Benchmark public int map_reduce() {
    return list.stream().map(x -> x * x).reduce((x, y) -> x + y).get();
  }
}

I expect the second one to be the fastest.

There is boxing in neither the second nor the third example (if the list contains already boxed elements). But, there is unboxing.

Your second example might have two unboxing (one for every x in x*x), while the third does unboxing only once. However, unboxing is fast and I think it does not worth to optimize that as a longer pipeline with an additional function call will certainly slow it down.

Sidenote: in general, you should not expect Streams to be faster than regular iterations on arrays or lists. When doing mathematical calculations where speed matters (like this) it's better to go the other way: simply iterate through the elements. If your output is an aggregated value, then aggregate it, if it is a mapping, then allocate a new array or list of the same size and fill it with the calculated values.