Since some confusion was expressed to me about the recent lib/csu/*/crt0.c changes, I figured I should explain more broadly what that code does (or is supposed to do) and why.
It's a bit obscure, but we can use more people understanding this stuff. For example, it would be nice if someone looked at teaching crt0 to relocate static PIE executables. Anyway... When we execve() an executable, the kernel has to do a bunch of things. Many of them deal with purely kernel data structures can be described in nice MI ways, such as the closing of FD_CLOEXEC fds in the process. Even the set up of the process's new address space is mostly MI , with the layout of the stack being the one big exception. For most archs, the stack grows down and we lay it out like this: struct ps_strings stack gap (random size + alignment) string of environment assignment n ... string of environment assignment 1 string of environment assignment 0 string of argument n ... string of argument 1 string of argument 0 auxv[n] (AUX_null) ... auxv[1] (AUX_phent) auxv[0] (AUX_phdr) NULL environ[n] ... environ[1] environ[0] NULL argv[n] ... argv[1] argv[0] word holding argc For hppa, where the stack grows upwards, ps_strings and the gap are at the bottom, but the rest is mostly unchanged. The kernel (kern/kern_exec.c and kern/exec_elf.c) handle the writing out of all that information. Okay, we've got memory image; what should the registers be? How do we set the registers so that, from the C perspective, we end up with a call to main(arcgc, argv, env)? Note that the location of main() varies from program to program and from invocation to invocation. And how do we get that to work in both the static and dynamicly linked case? The good news is that the ELF people have specs for this, but it's still a bunch of work. For each arch, the ELF "process-specific ABI" specifies some of the initial register values for the process, such as the floating point state and the stack pointer. But what's the initial program counter / instruction pointer? For static executables, that comes from the ELF header for the process: the e_entry member is the address of the entry point of the executable. The kernel's MD setregs() routine initializes the process initial thread to start with that as its program counter. The e_entry value is set by the linker to the value of the symbol passed to it via the -e option, or via the ENTRY() declaration in the linker script. For almost all our archs that's "__start", with vax and *ppc using "_start" instead. (Yes, we would like to use "__start" everywhere.) The __start (or _start) routine is defined by /usr/lib/crt0.o, which is automatically included in the link by gcc. That's currently compiled from /usr/src/lib/csu/${ARCH}/crt0.c and generally consists of three chunks: 3) the environ and __progname global and storage for __progname to point to 2) a C routine that - sets the environ and __progname globals from environment and arguments on the stack - optionally register a cleanup function with atexit - in profiling builds: - register a call to _mcleanup with atexit to finish profiling - enable profiling - call the executable's own initializer (constructor) functions - invoke main() and pass the return value to exit() 1) on most platforms, an ASM stub for __start that maps the registers specified by the ELF processor-specific ABI doc into arguments that can be handled by the C routine above in (2). For dynamic executables, it's a bit more complex: the executable specifies an "interpreter" by including a PT_INTERP segment. The kernel sees that and *also* loads the interpreter into memory, and then instead of starting the process at the e_entry of the executable, it starts it at the e_entry *of the interpreter*. The interpreter then looks at the auxinfo entries on the stack to find the real program and after doing whatever setup was requested, jumps to the e_entry of the executable. Now comes the tricky part: what if the interpreter wants to do something on process *exit*? For example, it may need to call shared library destructors. How can it get back control then? The good answer is that it should pass the e_entry routine of the executable the function pointer of a callback to be invoked on process exit. And indeed, that's what the ELF processor ABIs specify. For example, the amd64 ABI says that on process startup, register %rdx contain either zero or the address of a function to be invoked at process exit. It's the responsibility of the code in crt0.c to pass that ponter to atexit() when non-zero, thus that line item above that the code in crt0.c should "optionally register a cleanup function with atexit". Unfortunately, we don't actually do all that right now. Our crt0's didn't follow the ABI by register the cleanup pointer passed to it, and so our ld.so didn't expect to pass one. Instead, our ld.so directly peaks into the process's link map and looks for an atexit() function, and then itself calls that function with its callback for invoking the shared library destructors. That has three downsides: 1) it's not ABI compliant 2) it fails if you built a dynamic executable that linked libc staticly and the atexit() function wasn't pulled in during the link 3) it's fucking gross So, last year, kettenis fixed ld.so on each arch to set up the registers when calling the executable e_entry to be as specified by the ELF ABI docs, but passing NULL for the cleanup pointer. (alpha got missed, but that's been fixed.) So now that ld.so and kernel are both passing NULL for the cleanup pointer, we've been fixing crt0 to follow the ABI and pass that pointer to atexit() when its non-NULL (i.e., never). Once all the supported binaries are built with a crt0 supporting that (remember that crt0.o is staticly linked into every executable), we can change ld.so to stop calling atexit() itself and instead pass its cleanup function to the executable's e_entry routine as specified by the ABI. Philip Guenther