Get names and addresses of exported functions in linux

I do get quite annoyed when I see questions asking how to do something in operating system X that you do in Y.

In most cases, it is not an useful approach, because each operating system (family) tends to have their own approach to issues, so trying to apply something that works in X in Y is like stuffing a cube into a round hole.

Please note: the text here is intended as harsh, not condesceding; my command of the English language is not as good as I'd like. Harshness combined with actual help and pointers to known working solutions seems to work best in overcoming nontechnical limitations, in my experience.

In Linux, a test environment should use something like

LC_ALL=C LANG=C readelf -s FILE

to list all the symbols in FILE. readelf is part of the binutils package, and is installed if you intend to build new binaries on the system. This leads to portable, robust code. Do not forget that Linux encompasses multiple hardware architectures that do have real differences.

To build binaries in Linux, you normally use some of the tools provided in binutils. If binutils provided a library, or there was an ELF library based on the code used in binutils, it would be much better to use that, rather than parse the output of the human utilities. However, there is no such library (the libbfd library binutils uses internally is not ELF-specific). The [URL=http://www.mr511.de/software/english.html]libelf[/URL] library is good, but it is completely separate work by chiefly a single author. Bugs in it have been reported to binutils, which is unproductive, as the two are not related. Simply put, there are no guarantees that it handles the ELF files on a given architecture the same way binutils does. Therefore, for robustness and reliability, you'll definitely want to use binutils.

If you have a test application, it should use a script, say /usr/lib/yourapp/list-test-functions, to list the test-related functions:

#!/bin/bash
export LC_ALL=C LANG=C
for file in "$@" ; do
    readelf -s "$file" | while read num value size type bind vix index name dummy ; do
        [ "$type" = "FUNC" ] || continue
        [ "$bind" = "GLOBAL" ] || continue
        [ "$num" = "$[$num]" ] || continue
        [ "$index" = "$[$index]" ] || continue
        case "$name" in
            test_*) printf '%s\n' "$name"
                    ;;
        esac
    done
done

This way, if there is an architecture that has quirks (in the binutils' readelf output format in particular), you only need to modify the script. Modifying such a simple script is not difficult, and it is easy to verify the script works correctly -- just compare the raw readelf output to the script output; anybody can do that.

A subroutine that constructs a pipe, fork()s a child process, executes the script in the child process, and uses e.g. getline() in the parent process to read the list of names, is quite simple and extremely robust. Since this is also the one fragile spot, we've made it very easy to fix any quirks or problems here by using that external script (that is customizable/extensible to cover those quirks, and easy to debug). Remember, if binutils itself has bugs (other than output formatting bugs), any binaries built will almost certainly exhibit those same bugs also.

Being a Microsoft-oriented person, you probably will have trouble grasping the benefits of such a modular approach. (It is not specific to Microsoft, but specific to a single-vendor controlled ecosystem where the vendor-pushed approach is via overarching frameworks, and black boxes with clean but very limited interfaces. I think it as the framework limitation, or vendor-enforced walled garden, or prison garden. Looks good, but getting out is difficult. For description and history on the modular approach I'm trying to describe, see for example the Unix philosophy article at Wikipedia.)

The following shows that your approach is indeed possible in Linux, too -- although clunky and fragile; this stuff is intended to be done using the standard tools instead. It's just not the right approach in general.

The interface, symbols.h, is easiest to implement using a callback function that gets called for each symbol found:

#ifndef  SYMBOLS_H
#ifndef _GNU_SOURCE
#error You must define _GNU_SOURCE!
#endif
#define  SYMBOLS_H
#include <stdlib.h>

typedef enum {
    LOCAL_SYMBOL  = 1,
    GLOBAL_SYMBOL = 2,
    WEAK_SYMBOL   = 3,
} symbol_bind;

typedef enum {
    FUNC_SYMBOL   = 4,
    OBJECT_SYMBOL = 5,
    COMMON_SYMBOL = 6,
    THREAD_SYMBOL = 7,
} symbol_type;

int symbols(int (*callback)(const char *libpath, const char *libname, const char *objname,
                            const void *addr, const size_t size,
                            const symbol_bind binding, const symbol_type type,
                            void *custom),
            void *custom);

#endif /* SYMBOLS_H */

The ELF symbol binding and type macros are word-size specific, so to avoid the hassle, I declared the enum types above. I omitted some uninteresting types (STT_NOTYPE, STT_SECTION, STT_FILE), however.

The implementation, symbols.c:

#define _GNU_SOURCE
#include <stdlib.h>
#include <limits.h>
#include <string.h>
#include <stdio.h>
#include <fnmatch.h>
#include <dlfcn.h>
#include <link.h>
#include <errno.h>
#include "symbols.h"

#define UINTS_PER_WORD (__WORDSIZE / (CHAR_BIT * sizeof (unsigned int)))

static ElfW(Word) gnu_hashtab_symbol_count(const unsigned int *const table)
{
    const unsigned int *const bucket = table + 4 + table[2] * (unsigned int)(UINTS_PER_WORD);
    unsigned int              b = table[0];
    unsigned int              max = 0U;

    while (b-->0U)
        if (bucket[b] > max)
            max = bucket[b];

    return (ElfW(Word))max;
}

static symbol_bind elf_symbol_binding(const unsigned char st_info)
{
#if __WORDSIZE == 32
    switch (ELF32_ST_BIND(st_info)) {
#elif __WORDSIZE == 64
    switch (ELF64_ST_BIND(st_info)) {
#else
    switch (ELF_ST_BIND(st_info)) {
#endif
    case STB_LOCAL:  return LOCAL_SYMBOL;
    case STB_GLOBAL: return GLOBAL_SYMBOL;
    case STB_WEAK:   return WEAK_SYMBOL;
    default:         return 0;
    }
}

static symbol_type elf_symbol_type(const unsigned char st_info)
{
#if __WORDSIZE == 32
    switch (ELF32_ST_TYPE(st_info)) {
#elif __WORDSIZE == 64
    switch (ELF64_ST_TYPE(st_info)) {
#else
    switch (ELF_ST_TYPE(st_info)) {
#endif
    case STT_OBJECT: return OBJECT_SYMBOL;
    case STT_FUNC:   return FUNC_SYMBOL;
    case STT_COMMON: return COMMON_SYMBOL;
    case STT_TLS:    return THREAD_SYMBOL;
    default:         return 0;
    }
}

static void *dynamic_pointer(const ElfW(Addr) addr,
                             const ElfW(Addr) base, const ElfW(Phdr) *const header, const ElfW(Half) headers)
{
    if (addr) {
        ElfW(Half) h;

        for (h = 0; h < headers; h++)
            if (header[h].p_type == PT_LOAD)
                if (addr >= base + header[h].p_vaddr &&
                    addr <  base + header[h].p_vaddr + header[h].p_memsz)
                    return (void *)addr;
    }

    return NULL;
}

struct phdr_iterator_data {
    int  (*callback)(const char *libpath, const char *libname,
                     const char *objname, const void *addr, const size_t size,
                     const symbol_bind binding, const symbol_type type,
                     void *custom);
    void  *custom;
};

static int iterate_phdr(struct dl_phdr_info *info, size_t size, void *dataref)
{
    struct phdr_iterator_data *const data = dataref;
    const ElfW(Addr)                 base = info->dlpi_addr;
    const ElfW(Phdr) *const          header = info->dlpi_phdr;
    const ElfW(Half)                 headers = info->dlpi_phnum;
    const char                      *libpath, *libname;
    ElfW(Half)                       h;

    if (!data->callback)
        return 0;

    if (info->dlpi_name && info->dlpi_name[0])
        libpath = info->dlpi_name;
    else
        libpath = "";

    libname = strrchr(libpath, '/');
    if (libname && libname[0] == '/' && libname[1])
        libname++;
    else
        libname = libpath;

    for (h = 0; h < headers; h++)
        if (header[h].p_type == PT_DYNAMIC) {
            const ElfW(Dyn)  *entry = (const ElfW(Dyn) *)(base + header[h].p_vaddr);
            const ElfW(Word) *hashtab;
            const ElfW(Sym)  *symtab = NULL;
            const char       *strtab = NULL;
            ElfW(Word)        symbol_count = 0;

            for (; entry->d_tag != DT_NULL; entry++)
                switch (entry->d_tag) {
                case DT_HASH:
                    hashtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
                    if (hashtab)
                        symbol_count = hashtab[1];
                    break;
                case DT_GNU_HASH:
                    hashtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
                    if (hashtab) {
                        ElfW(Word) count = gnu_hashtab_symbol_count(hashtab);
                        if (count > symbol_count)
                            symbol_count = count;
                    }
                    break;
                case DT_STRTAB:
                    strtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
                    break;
                case DT_SYMTAB:
                    symtab = dynamic_pointer(entry->d_un.d_ptr, base, header, headers);
                    break;
                }

            if (symtab && strtab && symbol_count > 0) {
                ElfW(Word)  s;

                for (s = 0; s < symbol_count; s++) {
                    const char *name;
                    void *const ptr = dynamic_pointer(base + symtab[s].st_value, base, header, headers);
                    symbol_bind bind;
                    symbol_type type;
                    int         result;

                    if (!ptr)
                        continue;

                    type = elf_symbol_type(symtab[s].st_info);
                    bind = elf_symbol_binding(symtab[s].st_info);
                    if (symtab[s].st_name)
                        name = strtab + symtab[s].st_name;
                    else
                        name = "";

                    result = data->callback(libpath, libname, name, ptr, symtab[s].st_size, bind, type, data->custom);
                    if (result)
                        return result;
                }
            }
        }

    return 0;
}

int symbols(int (*callback)(const char *libpath, const char *libname, const char *objname,
                            const void *addr, const size_t size,
                            const symbol_bind binding, const symbol_type type,
                            void *custom),
            void *custom)
{
    struct phdr_iterator_data data;

    if (!callback)
        return errno = EINVAL;

    data.callback = callback;
    data.custom = custom;

    return errno = dl_iterate_phdr(iterate_phdr, &data);
}

When compiling the above, remember to link against the dl library.

You may find the gnu_hashtab_symbol_count() function above interesting; the format of the table is not well documented anywhere that I can find. This is tested to work on both i386 and x86-64 architectures, but it should be vetted against the GNU sources before relying on it in production code. Again, the better option is to just use those tools directly via a helper script, as they will be installed on any development machine.

Technically, a DT_GNU_HASH table tells us the first dynamic symbol, and the highest index in any hash bucket tells us the last dynamic symbol, but since the entries in the DT_SYMTAB symbol table always begin at 0 (actually, the 0 entry is "none"), I only consider the upper limit.

To match library and function names, I recommend using strncmp() for a prefix match for libraries (match at the start of the library name, up to the first .). Of course, you can use fnmatch() if you prefer glob patterns, or regcomp()+regexec() if you prefer regular expressions (they are built-in to the GNU C library, no external libraries are needed).

Here is an example program, example.c, that just prints out all the symbols:

#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
#include <errno.h>
#include "symbols.h"

static int my_func(const char *libpath, const char *libname, const char *objname,
                   const void *addr, const size_t size,
                   const symbol_bind binding, const symbol_type type,
                   void *custom __attribute__((unused)))
{
    printf("%s (%s):", libpath, libname);

    if (*objname)
        printf(" %s:", objname);
    else
        printf(" unnamed");

    if (size > 0)
        printf(" %zu-byte", size);

    if (binding == LOCAL_SYMBOL)
        printf(" local");
    else
    if (binding == GLOBAL_SYMBOL)
        printf(" global");
    else
    if (binding == WEAK_SYMBOL)
        printf(" weak");

    if (type == FUNC_SYMBOL)
        printf(" function");
    else
    if (type == OBJECT_SYMBOL || type == COMMON_SYMBOL)
        printf(" variable");
    else
    if (type == THREAD_SYMBOL)
        printf(" thread-local variable");

    printf(" at %p\n", addr);
    fflush(stdout);

    return 0;
}

int main(int argc, char *argv[])
{
    int  arg;

    for (arg = 1; arg < argc; arg++) {
        void *handle = dlopen(argv[arg], RTLD_NOW);
        if (!handle) {
            fprintf(stderr, "%s: %s.\n", argv[arg], dlerror());
            return EXIT_FAILURE;
        }

        fprintf(stderr, "%s: Loaded.\n", argv[arg]);
    }

    fflush(stderr);

    if (symbols(my_func, NULL))
        return EXIT_FAILURE;

    return EXIT_SUCCESS;
}

To compile and run the above, use for example

gcc -Wall -O2 -c symbols.c
gcc -Wall -O2 -c example.c
gcc -Wall -O2 example.o symbols.o -ldl -o example
./example | less

To see the symbols in the program itself, use the -rdynamic flag at link time to add all symbols to the dynamic symbol table:

gcc -Wall -O2 -c symbols.c
gcc -Wall -O2 -c example.c
gcc -Wall -O2 -rdynamic example.o symbols.o -ldl -o example
./example | less

On my system, the latter prints out

 (): stdout: 8-byte global variable at 0x602080
 (): _edata: global at 0x602078
 (): __data_start: global at 0x602068
 (): data_start: weak at 0x602068
 (): symbols: 70-byte global function at 0x401080
 (): _IO_stdin_used: 4-byte global variable at 0x401150
 (): __libc_csu_init: 101-byte global function at 0x4010d0
 (): _start: global function at 0x400a57
 (): __bss_start: global at 0x602078
 (): main: 167-byte global function at 0x4009b0
 (): _init: global function at 0x4008d8
 (): stderr: 8-byte global variable at 0x602088
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): unnamed local at 0x7fc652097000
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): unnamed local at 0x7fc652097da0
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): __asprintf: global function at 0x7fc652097000
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): free: global function at 0x7fc652097000
...
/lib/x86_64-linux-gnu/libdl.so.2 (libdl.so.2): dlvsym: 118-byte weak function at 0x7fc6520981f0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc651cf14a0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): unnamed local at 0x7fc65208c740
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _rtld_global: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): __libc_enable_secure: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): __tls_get_addr: global function at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _rtld_global_ro: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_find_dso_for_object: global function at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_starting_up: weak at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): _dl_argv: global variable at 0x7fc651cd2000
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): putwchar: 292-byte global function at 0x7fc651d4a210
...
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): vwarn: 224-byte global function at 0x7fc651dc8ef0
/lib/x86_64-linux-gnu/libc.so.6 (libc.so.6): wcpcpy: 39-byte weak function at 0x7fc651d75900
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): unnamed local at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): unnamed local at 0x7fc65229bae0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_get_tls_static_info: 21-byte global function at 0x7fc6522adaa0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_PRIVATE: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_2.3: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): GLIBC_2.4: global variable at 0x7fc65229b000
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): free: 42-byte weak function at 0x7fc6522b2c40
...
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): malloc: 13-byte weak function at 0x7fc6522b2bf0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_allocate_tls_init: 557-byte global function at 0x7fc6522adc00
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _rtld_global_ro: 304-byte global variable at 0x7fc6524bdcc0
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): __libc_enable_secure: 4-byte global variable at 0x7fc6524bde68
/lib64/ld-linux-x86-64.so.2 (ld-linux-x86-64.so.2): _dl_rtld_di_serinfo: 1620-byte global function at 0x7fc6522a4710

I used ... to mark where I removed lots of lines.

Questions?


To get a list of exported symbols from a shared library (a .so) under Linux, there are two ways: the easy one and a slightly harder one.

The easy one is to use the console tools already available: objdump (included in GNU binutils):

$ objdump -T /usr/lib/libid3tag.so.0
00009c15 g    DF .text  0000012e  Base        id3_tag_findframe
00003fac g    DF .text  00000053  Base        id3_ucs4_utf16duplicate
00008288 g    DF .text  000001f2  Base        id3_frame_new
00007b73 g    DF .text  000003c5  Base        id3_compat_fixup
...

The slightly harder way is to use libelf and write a C/C++ program to list the symbols yourself. Have a look at the elfutils package, which is also built from the libelf source. There is a program called eu-readelf (the elfutils version of readelf, not to be confused with the binutils readelf). eu-readelf -s $LIB lists exported symbols using libelf, so you should be able to use that as a starting point.