Why does ld need -rpath-link when linking an executable against a so that needs another so?

Why is it, that ld MUST be able to locate liba.so when linking test? Because to me it doesn't seem like ld is doing much else than confirming liba.so's existence. For instance, running readelf --dynamic ./test only lists libb.so as needed, so I guess the dynamic linker must discover the libb.so -> liba.so dependency on its own, and make it's own search for liba.so.

Well if I understand linking process correctly, ld actually does not need to locate even libb.so. It could just ignore all unresolved references in test hoping that dynamic linker would resolve them when loading libb.so at runtime. But if ld were doing in this way, many "undefined reference" errors would not be detected at link time, instead they would be found when trying to load test in runtime. So ld just does additional checking that all symbols not found in test itself can be really found in shared libraries that test depend on. So if test program has "undefined reference" error (some variable or function not found in test itself and neither in libb.so), this becomes obvious at link time, not just at runtime. Thus such behavior is just an additional sanity check.

But ld goes even further. When you link test, ld also checks that all unresolved references in libb.so are found in the shared libraries that libb.so depends on (in our case libb.so depends on liba.so, so it requires liba.so to be located at link time). Well, actually ld has already done this checking, when it was linking libb.so. Why does it do this checking second time... Maybe developers of ld found this double checking useful to detect broken dependencies when you try to link your program against outdated library that could be loaded in the times when it was linked, but now it can't be loaded because the libraries it depends on are updated (for example, liba.so was later reworked and some of the function was removed from it).

UPD

Just did few experiments. It seems my assumption "actually ld has already done this checking, when it was linking libb.so" is wrong.

Let us suppose the liba.c has the following content:

int liba_func(int i)
{
    return i + 1;
}

and libb.c has the next:

int liba_func(int i);
int liba_nonexistent_func(int i);

int libb_func(int i)
{
    return liba_func(i + 1) + liba_nonexistent_func(i + 2);
}

and test.c

#include <stdio.h>

int libb_func(int i);

int main(int argc, char *argv[])
{
    fprintf(stdout, "%d\n", libb_func(argc));
    return 0;
}

When linking libb.so:

gcc -o libb.so -fPIC -shared libb.c liba.so

linker doesn't generate any error messages that liba_nonexistent_func cannot be resolved, instead it just silently generate broken shared library libb.so. The behavior is the same as you would make a static library (libb.a) with ar which doesn't resolve symbols of the generated library too.

But when you try to link test:

gcc -o test -Wl,-rpath-link=./ test.c libb.so

you get the error:

libb.so: undefined reference to `liba_nonexistent_func'
collect2: ld returned 1 exit status

Detecting such error would not be possible if ld didn't scan recursively all the shared libraries. So it seems that the answer to the question is the same as I told above: ld needs -rpath-link in order to make sure that the linked executable can be loaded later by dynamic loaded. Just a sanity check.

UPD2

It would make sense to check for unresolved references as early as possible (when linking libb.so), but ld for some reasons doesn't do this. It's probably for allowing to make cyclic dependencies for shared libraries.

liba.c can have the following implementation:

int libb_func(int i);

int liba_func(int i)
{
    int (*func_ptr)(int) = libb_func;
    return i + (int)func_ptr;
}

So liba.so uses libb.so and libb.so uses liba.so (better never do such a thing). This successfully compiles and works:

$ gcc -o liba.so -fPIC -shared liba.c
$ gcc -o libb.so -fPIC -shared libb.c liba.so
$ gcc -o test test.c -Wl,-rpath=./ libb.so
$ ./test
-1217026998

Though readelf says that liba.so doesn't need libb.so:

$ readelf -d liba.so | grep NEEDED
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
$ readelf -d libb.so | grep NEEDED
 0x00000001 (NEEDED)                     Shared library: [liba.so]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]

If ld checked for unresolved symbols during the linking of a shared library, the linking of liba.so would not be possible.

Note that I used -rpath key instead of -rpath-link. The difference is that -rpath-link is used at linking time only for checking that all symbols in the final executable can be resolved, whereas -rpath actually embeds the path you specify as parameter into the ELF:

$ readelf -d test | grep RPATH
 0x0000000f (RPATH)                      Library rpath: [./]

So it's now possible to run test if the shared libraries (liba.so and libb.so) are located at your current working directory (./). If you just used -rpath-link there would be no such entry in test ELF, and you would have to add the path to the shared libraries to the /etc/ld.so.conf file or to the LD_LIBRARY_PATH environment variable.

UPD3

It is actually possible to check for unresolved symbols during linking shared library, --no-undefined option must be used for doing that:

$ gcc -Wl,--no-undefined -o libb.so -fPIC -shared libb.c liba.so
/tmp/cc1D6uiS.o: In function `libb_func':
libb.c:(.text+0x2d): undefined reference to `liba_nonexistent_func'
collect2: ld returned 1 exit status

Also I found a good article that clarifies many aspects of linking shared libraries that depend on other shared libraries: Better understanding Linux secondary dependencies solving with examples.


I guess you need to know when to use -rpath option and -rpath-link option. First I quote what man ld specified :

  1. The difference between -rpath and -rpath-link is that directories specified by -rpath options are included in the executable and used at runtime, whereas the -rpath-link option is only effective at link time. Searching -rpath in this way is only supported by native linkers and cross linkers which have been configured with the --with-sysroot option.

You must distinguish between link-time and runtime. According to your accepted anton_rh's answer, checking for undefined symbols is not enabled when compiling and linking shared libraries or static libraries, but ENABLED when compiling and linking executables. (However, please note that there exist some files which are shared library as well as executables, for example, ld.so. Type man ld.so to explore this, and I don't know whether or not checking for undefined symbols is enabled when compiling these files of "dual" kinds).

So -rpath-link is used in link-time checking, and -rpath is used for link-time and runtime because rpath is embedded into ELF headers. But you should be careful that -rpath-link option will override -rpath option during link-time if both of them are specified.

But still, why -rpath-option and -rpath option? I think they are used for eliminating "overlinking". See this Better understanding Linux secondary dependencies solving with examples., simply use ctrl + F to navigate to contents related to "overlinking". You should focus on why "overlinking" is bad, and because of the method we adopt to avoid "overlinking", the existence of ld options -rpath-link and -rpath is reasonable: we deliberately omit some libraries in the commands for compiling and linking to avoid "overlinking", and because of omitting, ld need -rpath-link or -rpath to locate these omitted libraries.


You system, through ld.so.conf, ld.so.conf.d, and the system environment, LD_LIBRARY_PATH, etc.., provides the system-wide library search paths which are supplemented by installed libraries through pkg-config information and the like when you build against standard libraries. When a library resides in a defined search path, the standard library search paths are followed automatically allowing all required libraries to be found.

There is no standard run-time library search path for custom shared libraries you create yourself. You specify the search path to your libraries through the -L/path/to/lib designation during compile and link. For libraries in non-standard locations, the library search path can be optionally placed in the header of your executable (ELF header) at compile-time so that your executable can find the needed libraries.

rpath provides a way of embedding your custom run-time library search path in the ELF header so that your custom libraries can be found as well without having to specify the search path each time it is used. This applies to libraries that depend on libraries as well. As you have found, not only is the order you specify the libraries on the command line important, you also must provide the run-time library search path, or rpath, information for each dependent library you are linking against as well so that the header contains the location of all libraries needed to run.

Addemdum from Comments

My question is primarily why ld must "automatically try to locate the shared library" (liba.so) and "include it in the link".

That is simply the way ld works. From man ld "The -rpath option is also used when locating shared objects which are needed by shared objects explicitly included in the link ... If -rpath is not used when linking an ELF executable, the contents of the environment variable "LD_RUN_PATH" will be used if it is defined." In your case liba isn't located in the LD_RUN_PATH so ld will need a way locating liba during the compile of your executable, either with rpath (described above) or by providing an explicit search path to it.

Secondarily what "include it in the link" really means. To me it seems that it just means: "confirm it's existence" (liba.so's), since libb.so's ELF headers are not modified (they already had a NEEDED tag against liba.so), and the exec's headers only declare libb.so as NEEDED. Why does ld care about finding liba.so, can it not just leave the task to the run-time linker?

No, back to the semantics of ld. In order to produce a "good link", ld must be able to locate all dependent libraries. ld cannot insure a good link otherwise. The runtime linker must find and load, not just to find the shared libraries needed by a program. ld cannot guarantee that will happen unless ld itself can locate all needed shared libraries at the time the progam is linked.