module_init() vs. core_initcall() vs. early_initcall()

It seems that no one has focused on how the linker script is configured to provide function pointers used for initialization to the kernel code, so let's try to look at how beautifully the Linux kernel creates linker script for init calls.

Because above great answers showed that how the Linux C code can create and manage all the initcalls in such as way that how to define a function as initcall, global variable to access the defined functions, and functions that actually invokes the defined initcall at the initialization phase, I don't want to revisit them again.

Therefore, here, we'd like to focus on how each element of the global array variable called initcall_levels[] is defined, what does it mean, what is contained in the memory pointed to by each element of the initcall_levels array, etc.

First, let's try to understand where the variables are defined in the Linux kernel repository. When you look at the init/main.c file, you can find that all elements of the initcall_levels array have not been defined in the main.c file and imported from somewhere.

extern initcall_t __initcall_start[];
extern initcall_t __initcall0_start[];
extern initcall_t __initcall1_start[];
extern initcall_t __initcall2_start[];
extern initcall_t __initcall3_start[];
extern initcall_t __initcall4_start[];
extern initcall_t __initcall5_start[];
extern initcall_t __initcall6_start[];
extern initcall_t __initcall7_start[];
extern initcall_t __initcall_end[];

However, you can find that those variables are not declared in any C source code of the Linux repository, then where the variables come from? From linker script!

Linux provides lots of helper functions to help programmers generate architecture-specific linker script file, and they are defined in linux/include/asm-generic/vmlinux.lds.h file which also provides helper for the initcalls.

#define __VMLINUX_SYMBOL(x) _##x
#define __VMLINUX_SYMBOL_STR(x) "_" #x
#else
#define __VMLINUX_SYMBOL(x) x
#define __VMLINUX_SYMBOL_STR(x) #x
#endif

/* Indirect, so macros are expanded before pasting. */
#define VMLINUX_SYMBOL(x) __VMLINUX_SYMBOL(x)

#define INIT_CALLS_LEVEL(level)                     \
        VMLINUX_SYMBOL(__initcall##level##_start) = .;      \
        KEEP(*(.initcall##level##.init))            \
        KEEP(*(.initcall##level##s.init))           \

#define INIT_CALLS                          \
        VMLINUX_SYMBOL(__initcall_start) = .;           \
        KEEP(*(.initcallearly.init))                \
        INIT_CALLS_LEVEL(0)                 \
        INIT_CALLS_LEVEL(1)                 \
        INIT_CALLS_LEVEL(2)                 \
        INIT_CALLS_LEVEL(3)                 \
        INIT_CALLS_LEVEL(4)                 \
        INIT_CALLS_LEVEL(5)                 \
        INIT_CALLS_LEVEL(rootfs)                \
        INIT_CALLS_LEVEL(6)                 \
        INIT_CALLS_LEVEL(7)                 \
        VMLINUX_SYMBOL(__initcall_end) = .;

We can easily find that several macros are defined for initcalls. Most important macro is INIT_CALLS that emits linker script syntax that defines a linker script symbol that can be accessed in the plain C code and input section.

In detail, each invocation of INIT_CALLS_LEVEL(x) macro defines a new symbol called __initcall##level_##start (refer ## concatenation operation in CPP); this symbol is generated by VMLINUX_SYMBOL(__initcall##level##_start) = .;. For example INIT_CALLS_LEVEL(1) macro defines linker script symbol named __initcall1_start.

As a result, symbols __initcall0_start to __initcall7_start are defined in the linker script and can be referenced in the C code by declaring it with the extern keyword.

Also, INIT_CALLS_LEVEL macro defines new sections called .initcallN.init, here N is a 0 to 7. The generated section contains all the functions defined with a provided macro such as __define_initcall as specified by the section attribute.

#define __define_initcall(fn, id) \
    static initcall_t __initcall_##fn##id __used \
    __attribute__((__section__(".initcall" #id ".init"))) = fn

The created symbols and sections should be configured correctly by the linker script to be located in one section, .init.data section. To enable this, the INIT_DATA_SECTION macro is used; and we can find that it invokes INIT_CALLS macro that we've looked.

#define INIT_DATA_SECTION(initsetup_align)              \
    .init.data : AT(ADDR(.init.data) - LOAD_OFFSET) {       \
        INIT_DATA                       \
        INIT_SETUP(initsetup_align)             \
        INIT_CALLS                      \
        CON_INITCALL                        \
        SECURITY_INITCALL                   \
        INIT_RAM_FS                     \
    }

Therefore, by invoking INIT_CALLS macro, the Linux linker locates __initcall0_start to __initcall7_start symbols and the .initcall0.init to .initcall7.init sections in the .init.data section, which are located back to back. Here note that each symbol doesn't contain any data, but used to locate where the generated section starts and ends.

Then let's try to look at if the compiled Linux kernel correctly contains the generated symbols, sections, and the function. After compiling the Linux kernel, by making use of the nm tool, we can retrieve all symbols defined in the compiled Linux image called vmlinux.

//ordering nm result numerical order 
$nm -n vmlinux > symbol 
$vi symbol


ffffffff828ab1c8 T __initcall0_start
ffffffff828ab1c8 t __initcall_ipc_ns_init0
ffffffff828ab1d0 t __initcall_init_mmap_min_addr0
ffffffff828ab1d8 t __initcall_evm_display_config0
ffffffff828ab1e0 t __initcall_init_cpufreq_transition_notifier_list0
ffffffff828ab1e8 t __initcall_jit_init0
ffffffff828ab1f0 t __initcall_net_ns_init0
ffffffff828ab1f8 T __initcall1_start
ffffffff828ab1f8 t __initcall_xen_pvh_gnttab_setup1
ffffffff828ab200 t __initcall_e820__register_nvs_regions1
ffffffff828ab208 t __initcall_cpufreq_register_tsc_scaling1
......
ffffffff828ab3a8 t __initcall___gnttab_init1s
ffffffff828ab3b0 T __initcall2_start
ffffffff828ab3b0 t __initcall_irq_sysfs_init2
ffffffff828ab3b8 t __initcall_audit_init2
ffffffff828ab3c0 t __initcall_bdi_class_init2

As shown in the above, in between __initcall0_start and __initcall2_start symbol, all functions defined with pure_initcall macro are located. For example, let's look at the ipc_ns_init function defined in ipc/shim.c file

static int __init ipc_ns_init(void)
{
    const int err = shm_init_ns(&init_ipc_ns);
    WARN(err, "ipc: sysv shm_init_ns failed: %d\n", err);
    return err;
}

pure_initcall(ipc_ns_init); 

As shown in the above, pure_initcall macro is used to put ipc_ns_init function in the .initcall0.init section which is located by the __initcall0_start symbol. Therefore, as shown in the below code, all the functions in the .initcallN.init sections are invoked one by one sequentially.

for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
    do_one_initcall(*fn);

They determine the initialization order of built-in modules. Drivers will use device_initcall (or module_init; see below) most of the time. Early initialization (early_initcall) is normally used by architecture-specific code to initialize hardware subsystems (power management, DMAs, etc.) before any real driver gets initialized.

Technical stuff for understanding below

Look at init/main.c. After a few architecture-specific initialization done by code in arch/<arch>/boot and arch/<arch>/kernel, the portable start_kernel function will be called. Eventually, in the same file, do_basic_setup is called:

/*
 * Ok, the machine is now initialized. None of the devices
 * have been touched yet, but the CPU subsystem is up and
 * running, and memory and process management works.
 *
 * Now we can finally start doing some real work..
 */
static void __init do_basic_setup(void)
{
    cpuset_init_smp();
    usermodehelper_init();
    shmem_init();
    driver_init();
    init_irq_proc();
    do_ctors();
    usermodehelper_enable();
    do_initcalls();
}

which ends with a call to do_initcalls:

static initcall_t *initcall_levels[] __initdata = {
    __initcall0_start,
    __initcall1_start,
    __initcall2_start,
    __initcall3_start,
    __initcall4_start,
    __initcall5_start,
    __initcall6_start,
    __initcall7_start,
    __initcall_end,
};

/* Keep these in sync with initcalls in include/linux/init.h */
static char *initcall_level_names[] __initdata = {
    "early",
    "core",
    "postcore",
    "arch",
    "subsys",
    "fs",
    "device",
    "late",
};

static void __init do_initcall_level(int level)
{
    extern const struct kernel_param __start___param[], __stop___param[];
    initcall_t *fn;

    strcpy(static_command_line, saved_command_line);
    parse_args(initcall_level_names[level],
           static_command_line, __start___param,
           __stop___param - __start___param,
           level, level,
           &repair_env_string);

    for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
        do_one_initcall(*fn);
}

static void __init do_initcalls(void)
{
    int level;

    for (level = 0; level < ARRAY_SIZE(initcall_levels) - 1; level++)
        do_initcall_level(level);
}

You can see the names above with their associated index: early is 0, core is 1, etc. Each of those __initcall*_start entries point to an array of function pointers which get called one after the other. Those function pointers are the actual modules and built-in initialization functions, the ones you specify with module_init, early_initcall, etc.

What determines which function pointer gets into which __initcall*_start array? The linker does this, using hints from the module_init and *_initcall macros. Those macros, for built-in modules, assign the function pointers to a specific ELF section.

Example with module_init

Considering a built-in module (configured with y in .config), module_init simply expands like this (include/linux/init.h):

#define module_init(x)  __initcall(x);

and then we follow this:

#define __initcall(fn) device_initcall(fn)
#define device_initcall(fn)             __define_initcall(fn, 6)

So, now, module_init(my_func) means __define_initcall(my_func, 6). This is _define_initcall:

#define __define_initcall(fn, id) \
    static initcall_t __initcall_##fn##id __used \
    __attribute__((__section__(".initcall" #id ".init"))) = fn

which means, so far, we have:

static initcall_t __initcall_my_func6 __used
__attribute__((__section__(".initcall6.init"))) = my_func;

Wow, lots of GCC stuff, but it only means that a new symbol is created, __initcall_my_func6, that's put in the ELF section named .initcall6.init, and as you can see, points to the specified function (my_func). Adding all the functions to this section eventually creates the complete array of function pointers, all stored within the .initcall6.init ELF section.

Initialization example

Look again at this chunk:

for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
    do_one_initcall(*fn);

Let's take level 6, which represents all the built-in modules initialized with module_init. It starts from __initcall6_start, its value being the address of the first function pointer registered within the .initcall6.init section, and ends at __initcall7_start (excluded), incrementing each time with the size of *fn (which is an initcall_t, which is a void*, which is 32-bit or 64-bit depending on the architecture).

do_one_initcall will simply call the function pointed to by the current entry.

Within a specific initialization section, what determines why an initialization function is called before another is simply the order of the files within the Makefiles since the linker will concatenate the __initcall_* symbols one after the other in their respective ELF init. sections.

This fact is actually used in the kernel, e.g. with device drivers (drivers/Makefile):

# GPIO must come after pinctrl as gpios may need to mux pins etc
obj-y                           += pinctrl/
obj-y                           += gpio/

tl;dr: the Linux kernel initialization mechanism is really beautiful, albeit highlight GCC-dependent.


module_init is used to mark a function to be used as the entry-point of a Linux device-driver.
It is called

  • during do_initcalls() (for a builtin driver)
    or
  • at module insertion time (for a *.ko module)

There can be ONLY 1 module_init() per driver module.


The *_initcall() functions are usually used to set the function-pointers for initialising various subsystems.

do_initcalls() within Linux kernel source code contains the invocation of the list of various initcalls and the relative order in which they are called during the Linux kernel boot-up.

  1. early_initcall()
  2. core_initcall()
  3. postcore_initcall()
  4. arch_initcall()
  5. subsys_initcall()
  6. fs_initcall()
  7. device_initcall()
  8. late_initcall()
    end of built-in modules
  9. modprobe or insmod of *.ko modules.

Using module_init() in a device driver is equivalent to registering a device_initcall().

Keep in mind that during compilation, the order of linking the various driver object files(*.o) within the Linux kernel is significant; it determines the order in which they are called at runtime.

*_initcall functions of the same level
will be called during boot in the order they are linked.

For example changing the link order of SCSI drivers in drivers/scsi/Makefile will change the order in which the SCSI controllers are detected, and thus the numbering of the disks.