How does the ELF loader determine the initial stack size?

I don't believe this question is really to do with ELF. As far as I know, ELF defines a way to "flat pack" a program image into files and then re-assemble it ready for first execution. The definition of what the stack is and how it's implemented sits somewhere between CPU specific and OS specific if the OS behaviour hasn't been elevated to POSIX. Though no-doubt the ELF specification makes some demands about what it needs on the stack.

Minimum stack Allocation

From your question:

I am aware that the page just below the stack is a "guard page" that automatically becomes writable and "grows down the stack" if I write to it (presumably so that naive stack handling "just works"), but if I allocate a huge stack frame then I could overshoot the guard page and segfault, so I want to determine how much space is already properly allocated to me right at process start.

I'm struggling to find an authoritative reference for this. But I have found a large enough number of non-authoritative references to suggest this is incorrect.

From what I've read, the guard page is used to catch access outside the maximum stack allocation, and not for "normal" stack growth. The actual memory allocation (mapping pages to memory addresses) is done on demand. Ie: when un-mapped addresses in memory are accessed which are between stack-base and stack-base - max-stack-size + 1, an exception might be triggered by the CPU, but the Kernel will handle the exception by mapping a page of memory, not cascading a segmentation fault.

So accessing the stack inside the maximum allocation shouldn't cause a segmentation fault. As you've discovered

Maximum stack Allocation

Investigating documentation ought to follow lines of Linux documentation on thread creation and image loading (fork(2), clone(2), execve(2)). The documentation of execve mentions something interesting:

Limits on size of arguments and environment

...snip...

On kernel 2.6.23 and later, most architectures support a size limit derived from the soft RLIMIT_STACK resource limit (see getrlimit(2))

...snip...

This confirms that the limit requires the architecture to support it and also references where it's limited (getrlimit(2)).

RLIMIT_STACK

This is the maximum size of the process stack, in bytes. Upon reaching this limit, a SIGSEGV signal is generated. To handle this signal, a process must employ an alternate signal stack (sigaltstack(2)).

Since Linux 2.6.23, this limit also determines the amount of space used for the process's command-line arguments and envi‐ronment variables; for details, see execve(2).

Growing the stack by changing the RSP register

I don't know x86 assembler. But I'll draw your attention to the "Stack Fault Exception" which can be triggered by x86 CPUs when the SS register is changed. Please do correct me if I'm wrong, but I believe on x86-64 SS:SP has just become "RSP". So if I understand correctly a Stack Fault Exception can be triggered by decremented RSP (subq $0x7fe000,%rsp).

See page 222 here: https://xem.github.io/minix86/manual/intel-x86-and-64-manual-vol3/o_fe12b1e2a880e0ce.html


Every process memory region (e.g code, static data, heap, stack, etc.) has boundaries, and a memory access outside of any region, or a write access to a read-only region generates a CPU exception. The kernel maintains these memory regions. An access outside of a region propagates up to user space in the form of a segmentation fault signal.

Not all exceptions are generated by accessing memory outside the regions. An in-region access can also generate an exception. For example, if the page is not mapped to physical memory, the page fault handler handles this transparently to the running process.

The process main stack region initially has only a small number of page frames mapped to it, but grows automatically when more data is pushed to it via the stack pointer. The exception handler checks that the access is still within the region reserved for the stack, and allocates a new page frame if it is. This happens automatically from the point of view of the user level code.

A guard page is placed right after the end of the stack region, to detect an overrun of the stack region. Recently (in 2017) some people realized that a single guard page is not sufficient, because a program can potentially be tricked to decrement the stack pointer by a large amount, which may make the stack pointer point to some other region that permits writes. The "solution" to this problem was to replace the 4 kB guard page with a 1 MB guard region. See this LWN article.

It should be noted that this vulnerability is not entirely trivial to exploit, it requires, for example, that the user can control the amount of memory a program allocates via a call to alloca. Robust programs should check the parameter passed to alloca, especially if it is derived from user input.