Efficient stable sum of ordered numbers

I guess I can have some numerical stability problem

So test for it. Currently you have a hypothetical problem, which is to say, no problem at all.

If you test, and the hypothetical materializes into an actual problem, then you should worry about actually fixing it.

That is - floating-point precision can cause issues, but you can confirm whether it really does for your data, before prioritizing that over everything else.

... I will have more cache missing?

One thousand floats is 4Kb - it'll fit in cache on a modern mass-market system (if you have another platform in mind, tell us what it is).

The only risk is that the prefetcher won't help you when iterating backwards, but of course your vector may already be in cache. You can't really determine this until you profile in the context of your full program, so there's no use worrying about it until you have a full program.

Is there any other smart solution?

Don't worry about things that might become problems, until they actually become problems. At most it's worth noting possible issues, and structuring your code so that you can replace the simplest-possible solution with a carefully-optimized one later, without re-writing everything else.


I bench-marked your use case and the results (see attached image) point to the direction that it does not make any performance difference to loop forward or backward.

You might want to measure on your hardware + compiler as well.


Using STL to perform the sum it's as fast as manual looping over data but much more expressive.

use the following for reverse accumulation:

std::accumulate(rbegin(data), rend(data), 0.0f);

while for forward accumulation:

std::accumulate(begin(data), end(data), 0.0f);

enter image description here


The easiest solution would be to traverse the vector in reverse order. My question is: is that efficient as well as the forward case? I will have more cache missing?

Yes it is efficient. Branch prediction and smart cache strategy from your hardware is tuned for sequential access. You can safely accumulate your vector:

#include <numeric>

auto const sum = std::accumulate(crbegin(v), crend(v), 0.f);