Python string concatenation internal details

Having another name pointing to the same object kills the optimisation. The optimisation basically works by resizing the string object and appending in place. If you have more than one references to that object, you can't resize without affecting the other reference. With strings being immutable, allowing this would be a serious flaw of the implementation.

temp = result

increased the reference count for the string object named by result thereby prohibiting the optimisation.

The full list of checks performed in the case of += (which eventually translates to PyUnicode_Append) can be seen in the unicode_modifiable function. Among other things, it checks that the reference count of the object is equal to one, that it isn't interned and that it isn't a string subclass.

There's a couple more checks in the if statement guarding this optimisation, if you want a more thorough list.

Though not the basic issue of your question, future readers might be curious about how to efficiently perform string concatenations. Besides similar questions on S.O, the Python FAQ also has an entry on this.

Actually, the behavior you are observing is determined by the behavior of the memory-allocator of the C-runtime on your OS.

CPython has an optimization, that if the unicode-object has only one reference, it can be changed in-place - nobody would register that the unicode-object loss its immutability for a moment. See my answer to this SO-question for more details.

In foo2, there is another reference to the unicode object (temp), which prevents the in-place-optimization: Changing it in-place would break the immutability, because it could be observed through temp.

However, even with the inplace optimization, it is not obvious, why O(n^2) behavior can be avoided, as unicode object doesn't overallocate and thus has to exend the underlying buffer at every addition, which naively would mean a copy of the whole content (i.e. O(n)) in every step.

However, most of the time realloc (differently than malloc+copy) can be done in O(1), because if the memory directly behind the the allocated buffer is free, it can be used to extend the original without copying.

An interesting detail is that there is no guarantee, that foo will run in O(n): If the memory is fragemented (e.g. in a long running process). realloc wont be able to extend the buffer without copying the data and thus the running time will become O(n^2).

Thus one should not rely on this optimization to avoid quadratic running time.