Is splitting assignment into two lines still just as efficient?

Here's a comparison:

First case:

%%timeit
def foo():
    return "foo"

def bar(text):
    return text + "bar"

def test():
    x = foo()
    y = bar(x)
    return y

test()
#Output:
'foobar'
529 ns ± 114 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Second case:

%%timeit

def foo():
    return "foo"

def bar(text):
    return text + "bar"

def test():   
    x = bar(foo())
    return x

test()
#Output:
'foobar'
447 ns ± 34.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

But that is just the comparison running %%timeit once for each case. The following are times for 20 iterations(time in ns) for each case:

df = pd.DataFrame({'First Case(time in ns)': [623,828,634,668,715,659,703,687,614,623,697,634,686,822,671,894,752,742,721,742], 
               'Second Case(time in ns)': [901,786,686,670,677,683,685,638,628,670,695,657,698,707,726,796,868,703,609,852]})

df.plot(kind='density', figsize=(8,8))

enter image description here

It was observed, with each iteration, the differences were diminishing. This plot shows that the performance difference isn't significant. From a readability perspective, the second case looks better.

In the first case, two expressions are evaluated: the first expression assigns the return value from foo() to x first and then the second expression calls bar() on that value. This adds some overhead. In the second case only one expression is evaluated, calling both functions at once and returning the value.


It matters a tiny bit, but not meaningfully. amanb's test timed the definition of the functions in only one of the tests, and so had to do more work in the first test, skewing the results. Tested properly, the results differ only by the slimmest of margins. Using the same ipython %%timeit magic (IPython version 7.3.0, CPython version 3.7.2 for Linux x86-64), but removing the definition of the functions from the per-loop tests:

>>> def foo():
...     return "foo"
... def bar(text):
...     return text + "bar"
... def inline():
...     x = bar(foo())
...     return x
... def outofline():
...     x = foo()
...     x = bar(x)
...     return x
...

>>> %%timeit -r5 test = inline
... test()
...
...
332 ns ± 1.01 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)


>>> %%timeit -r5 test = outofline
... test()
...
...
341 ns ± 5.62 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

The inline code was faster, but the difference was under 10 ns/3%. Inlining further (to make the body just return bar(foo())) saves a tiny bit more, but again, it's pretty meaningless.

This is what you'd expect too; storing and loading function local names is about the cheapest thing the CPython interpreter can do, the only difference between the functions is that outofline requires an extra STORE_FAST and LOAD_FAST (one following the other), and those instructions are implemented internally as nothing but assignment to and reading from a compile-time determined slot in a C array, plus a single integer increment to adjust reference counts. You pay for the CPython interpreter overhead required by each byte code, but the cost of the actual work is trivial.

Point is: Don't worry about the speed, write whichever version of the code that would be more readable/maintainable. In this case, all the names are garbage, but if the output from foo can be given a useful name, then passed to bar whose output is given a different useful name, and without those names, the relationship between foo and bar is non-obvious, don't inline. If the relationship is obvious, and foo's output doesn't benefit from being named, inline it. Avoiding stores and loads from local variables is the most micro of microoptimizations; it won't be the cause of meaningful performance loss in almost any scenario, so don't base code design decisions on it.