Could the JIT collapse two volatile reads as one in certain expressions?

On one hand the very purpose of a volatile read is that it should always be fresh from memory.

That is not how the Java Language Specification defines volatile. The JLS simply says:

A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).

Therefore, a write to a volatile variable happens-before (and is visible to) any subsequent reads of that same variable.

This constraint is trivially satisfied for a read that is not subsequent. That is, volatile only ensures visibility of a write if the read is known to occur after the write.

This is not the case in your program. For every well formed execution that observes a to be 1, I can construct another well formed execution where a is observed to be 0, simply be moving the read after the write. This is possible because the happens-before relation looks as follows:

write 1   -->   read 1                    write 1   -->   read 1
   |              |                          |              |
   |              v                          v              |
   v      -->   read 1                    write 0           v
write 0           |             vs.          |      -->   read 0
   |              |                          |              |
   v              v                          v              v
write 1   -->   read 1                    write 1   -->   read 1                 

That is, all the JMM guarantees for your program is that a+a will yield 0, 1 or 2. That is satisfied if a+a always yields 0. Just as the operating system is permitted to execute this program on a single core, and always interrupt thread 1 before the same instruction of the loop, the JVM is permitted to reuse the value - after all, the observable behavior remains the same.

In general, moving the read across the write violates happens-before consistency, because some other synchronization action is "in the way". In the absence of such intermediary synchronization actions, a volatile read can be satisfied from a cache.


In my original answer, I argued against the legality of the suggested optimization. I backed this mainly from information of the JSR-133 cookbook where it states that a volatile read must not be reordered with another volatile read and where it further states that a cached read is to be treated as a reordering. The latter statement is however formulated with some ambiguouity which is why I went through the formal definition of the JMM where I did not find such indication. Therefore, I would now argue that the optimization is allowed. However, the JMM is quite complex and the discussion on this page indicates that this corner case might be decided differently by someone with a more thorough understanding of the formalism.

Denoting thread 1 to execute

while (true) {
  System.out.println(a // r_1 
    + a); // r_2
} 

and thread 2 to execute:

while (true) {
  a = 0; // w_1
  a = 1; // w_2
}

The two reads r_i and two writes w_i of a are synchronization actions as a is volatile (JSR 17.4.2). They are external actions as variable a is used in several threads. These actions are contained in the set of all actions A. There exists a total order of all synchronization actions, the synchronization order which is consistent with program order for thread 1 and thread 2 (JSR 17.4.4). From the definition of the synchronizes-with partial order, there is no edge defined for this order in the above code. As a consequence, the happens-before order only reflects the intra-thread semantics of each thread (JSR 17.4.5).

With this, we define W as a write-seen function where W(r_i) = w_2 and a value-written function V(w_i) = w_2 (JLS 17.4.6). I took some freedom and eliminated w_1 as it makes this outline of a formal proof even simpler. The question is of this proposed execution E is well-formed (JLS 17.5.7). The proposed execution E obeys intra-thread semantics, is happens-before consistent, obeys the synchronized-with order and each read observes a consistent write. Checking the causality requirements is trivial (JSR 17.4.8). I do neither see why the rules for non-terminating executions would be relevant as the loop covers the entire discussed code (JLS 17.4.9) and we do not need to distinguish observable actions.

For all this, I cannot find any indication of why this optimization would be forbidden. Nevertheless, it is not applied for volatile reads by the HotSpot VM as one can observe using -XX:+PrintAssembly. I assume that the performance benefits are however minor and this pattern is not normally observed.

Remark: After watching the Java memory model pragmatics (multiple times), I am pretty sure, this reasoning is correct.


Short answer:

Yes, this optimization is allowed. Collapsing two sequential read operations produes the observable behavior of the sequence being atomic, but does not appear as a reordering of operations. Any sequence of actions performed on a single thread of execution can be executed as an atomic unit. In general, it is difficult to ensure a sequence of operations executes atomically, and it rarely results in a performance gain because most execution environments introduce overhead to execute items atomically.

In the example given by the original question, the sequence of operations in question is the following:

read(a)
read(a)

Performing these operations atomically guarantees that the value read on the first line is equal to the value read on the second line. Furthermore, it means the value read on the second line is the value contained in a at the time the first read was executed (and vice versa, because atomic both read operations occurred at the same time according to the observable execution state of the program). The optimization in question, which is reusing the value of the first read for the second read, is equivalent to the compiler and/or JIT executing the sequence atomically, and is thus valid.


Original longer answer:

The Java Memory Model describes operations using a happens-before partial ordering. In order to express the restriction that the first read r1 and second read r2 of a cannot be collapsed, you need to show that some operation is semantically required to appear between them.

The operations on the thread with r1 and r2 is the following:

--> r(a) --> r(a) --> add -->

To express the requirement that something (say y) lie between r1 and r2, you need to require that r1 happens-before y and y happens-before r2. As it happens, there is no rule where a read operation appears on the left side of a happens-before relationship. The closest you could get is saying y happens-before r2, but the partial order would allow y to also occur before r1, thus collapsing the read operations.

If no scenario exists which requires an operation to fall between r1 and r2, then you can declare that no operation ever appears between r1 and r2 and not violate the required semantics of the language. Using a single read operation would be equivalent to this claim.

Edit My answer is getting voted down, so I'm going to go into additional details.

Here are some related questions:

  • Is the Java compiler or JVM required to collapse these read operations?

    No. The expressions a and a used in the add expression are not constant expressions, so there is no requirement that they be collapsed.

  • Does the JVM collapse these read operations?

    To this, I'm not sure of the answer. By compiling a program and using javap -c, it's easy to see that the Java compiler does not collapse these read operations. Unfortunately it's not as easy to prove the JVM does not collapse the operations (or even tougher, the processor itself).

  • Should the JVM collapse these read operations?

    Probably not. Each optimization takes time to execute, so there is a balance between the time it takes to analyze the code and the benefit you expect to gain. Some optimizations, such as array bounds check elimination or checking for null references, have proven to have extensive benefits for real-world applications. The only case where this particular optimization has the possibility of improving performance is cases where two identical read operations appear sequentially.

    Furthermore, as shown by the response to this answer along with the other answers, this particular change would result in an unexpected behavior change for certain applications which users may not desire.

Edit 2: Regarding Rafael's description of a claim that two read operations that cannot be reordered. This statement is designed to highlight the fact that caching the read operation of a in the following sequence could produce an incorrect result:

a1 = read(a)
b1 = read(b)
a2 = read(a)
result = op(a1, b1, a2)

Suppose initially a and b have their default value 0. Then you execute just the first read(a).

Now suppose another thread executes the following sequence:

a = 1
b = 1

Finally, suppose the first thread executes the line read(b). If you were to cache the originally read value of a, you would end up with the following call:

op(0, 1, 0)

This is not correct. Since the updated value of a was stored before writing to b, there is no way to read the value b1 = 1 and then read the value a2 = 0. Without caching, the correct sequence of events leads to the following call.

op(0, 1, 1)

However, if you were to ask the question "Is there any way to allow the read of a to be cached?", the answer is yes. If you can execute all three read operations in the first thread sequence as an atomic unit, then caching the value is allowed. While synchronizing across multiple variables is difficult and rarely provides an opportunistic optimization advantage, it is certainly conceivable to encounter an exception. For example, suppose a and b are each 4 bytes, and they appear sequentially in memory with a aligned on an 8-byte boundary. A 64-bit process could implement the sequence read(a) read(b) as an atomic 64-bit load operation, which would allow the value of a to be cached (effectively treating all three read operations as an atomic operation instead of just the first two).