avr-gcc: (seemingly) unneeded prologue/epilogue in simple function

I am not sure if this is a good answer, but it is the best I can give. The assembly for the f_u64() function allocates 72 bytes on the stack and then deallocates them again (since this involves registers r28 and r29, they are saved in the beginning and restored in the end).

If you try to compile without optimization (I also skipped the c++11 flag, I do not think it makes any difference), then you will see that the f_u64() function starts by allocating 80 bytes on the stack (similar to the opening statements you see in the optimized code, just with 80 bytes instead of 72 bytes):

    in r28,__SP_L__
    in r29,__SP_H__
    subi r28,80
    sbc r29,__zero_reg__
    in __tmp_reg__,__SREG__
    cli
    out __SP_H__,r29
    out __SREG__,__tmp_reg__
    out __SP_L__,r28

These 80 bytes are actually all used. First the value of the argument x is stored (8 bytes) and then a lot of moving data around is done involving the remaining 72 bytes.

After that the 80 bytes are deallocated on the stack similar to the closing statements in the optimized code:

    subi r28,-80
    sbci r29,-1
    in __tmp_reg__,__SREG__
    cli
    out __SP_H__,r29
    out __SREG__,__tmp_reg__
    out __SP_L__,r28

My guess is that the optimizer concludes that the 8 bytes for storing the argument can be spared. Hence it needs only 72 bytes. Then it concludes that all the moving around of data can be spared. However, it fails to figure out that this means that the 72 bytes on the stack can be spared.

Hence my best bet is that this is a limitation or an error in the optimizer (whatever you prefer to call it). In that case the only "solution" is to try to shuffle the real code around to find a work-around or raise it as an error on the compiler.