How LongAdder performs better than AtomicLong

does that mean LongAdder aggregates the values internally and update it later?

Yes, if I understand your statement correctly.

Each Cell in a LongAdder is a variant of an AtomicLong. Having multiple such cells is a way of spreading out the contention and thus increasing throughput.

When the final result (sum) is to be retrieved, it just adds together the values of each cell.

Much of the logic around how the cells are organized, how they are allocated etc can be seen in the source: http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/f398670f3da7/src/java.base/share/classes/java/util/concurrent/atomic/Striped64.java

In particular the number of cells is bound by the number of CPUs:

/** Number of CPUS, to place bound on table size */
static final int NCPU = Runtime.getRuntime().availableProcessors();

The primary reason it is "faster" is its contended performance. This is important because:

Under low update contention, the two classes have similar characteristics.

You'd use a LongAdder for very frequent updates, in which atomic CAS and native calls to Unsafe would cause contention. (See source and volatile reads). Not to mention cache misses/false sharing on multiple AtomicLongs (although I have not looked at the class layout yet, there doesn't appear to be sufficient memory padding before the actual long field.

under high contention, expected throughput of this class is significantly higher, at the expense of higher space consumption.

The implementation extends Striped64, which is a data holder for 64-bit values. The values are held in cells, which are padded (or striped), hence the name. Each operation made upon the LongAdder will modify the collection of values present in the Striped64. When contention occurs, a new cell is created and modified, so the the old thread can finish concurrently with contending one. When you need the final value, the sums of each cell is simply added up.

Unfortunately, performance comes with a cost, which in this case is memory (as often is). The Striped64 can grow very large if a large load of threads and updates are being thrown at it.

Quote source: Javadoc for LongAdder


Atomic Long uses CAS which - under heavy contention can lead to many wasted CPU cycles. LongAdder, on the other hand, uses a very clever trick to reduce contention between threads, when these are incrementing it. So when we call increment() , behind the scenes LongAdder maintains an array of counter that can grow on demand. And so, when more threads are calling increment(), the array will be longer. Each record in the array can be updated separately – reducing the contention. Due to that fact, the LongAdder is a very efficient way to increment a counter from multiple threads. The result of the counter in the LongAdder is not available until we call the sum() method.