The story about Dynamic Linking
In prev article, we already know how ELF is loaded into memory and runs up, but it is based on the assumption of static linking. How if ELF uses dynamic linking? Let’s identify the difference together through this article.
What’s dynamic linking?
Delaying the linking process to runtime, this is the core idea of dynamic linking.
If program 1 and program 2 both need lib. When we run up program 1, lib will also be mapped into memory. After that, when we run up program 2, lib doesn’t need to be loaded again. Program 2 just needs to link to the lib which already exists in the memory. This can not only save the cost of duplicate memory but also reduce the cost of page switch.
When program is loaded, dynamic linker (instead of
ld for static linking) will loads all necessary
lib.so into the virtual address space of the process, and do symbol resolution and relocation.
In Linux, dynamic linking file has extension as .so; in window, he has extension as .dll.
Take a brief review on some examples
Following the prev article, we know how to create static link libraries and dynamic link libraries. With static linking, compiling the source to
.o, and package
.a, which is the static link libraries. With dynamic linking, also compiling the source to
.o at first, then using
-shared option to output
To dig into more details, I would like to introduce an example, program
foobar() function in
lib.so, which would be linked in runtime. When
p.c is compiled to
p.o, compiler still doesn’t know the address of
foobar(). When linker convert
p.o to an ELF, linker needs to make sure the attribute of
foobar()is defined in static linking module
following the rule of static linking: relocate
foobar()is defined in dynamic linking module
linker would label
foobar()as a symbol of dynamic linking, delay the relocation work to runtime.
But, how does linker identify
foobar is defined in static or dynamic linking module? Linker needs the symbol information from
Load time relocation
It is easy for us to imagine the way by which the static linked ELF is loaded into the virtual address space because there is only one document needed to be loaded. However, for dynamic linking, ELF would be separated to several modules, then how would they all be arranged in the memory?
Take a view of
/proc/[pid]/maps. In addition to
p itself and
lib.so, we can also find
/lib/ld-x.x.x.so which is linux’s dynamic linker. What’s worth notice is that the loaded address of shared object is still not fixed in compilation phase.
So, back to our topic. How to make sure about the virtual address in runtime? We should start from figuring out the difficulty first? Assume that there are three shared objects: A, B, and C, we allocate (0x1000~0x2000) to A, and (0x2000~0x3000) to B. If a process needs B but not needs A, then (0x1000~0x2000) can be allocated to C. This would cause to the conflict address between A and C, which means that we can never use A and C at the same time in a process.
To avoid the conflict on address, shared objects should not assume their virtual address in compilation phase, while ELF can still guarantee at fixed address (e.g.,
0x08040000 for linux or
0x0040000 for windows) because it is the first loaded file. For example,
foobar has relative location as 0x100. If the loaded address of the segment is 0x10000000, then address of
foobar would be relocated to 0x10000100. This is how load time relocation works compared to the link time relocation by static linking. The
-shared option also means that shared objects would use load time relocation.
-shared option, we can also find
-fPIC option in line of creating shared objects. What’s its effect? To make the text can be shared by multiple processes, the solution is PIC (Position-Independent Code).
- Inner-module call or jmp
Caller and callee are in the same module, and their relative offset is fixed. Therefore, can be implemented by relative offset, and the offset would be (dest address - next instruction address).
- Inner-module data access
The relative offset of data address to the caller is also fixed. The caller instruction (next instruction) address can be got by
get_pc_thunk.cxin x86 or
%ripin x86-64 (Therefore, the address of data is the address of next instruction plus the relative offset).
- Inter-module data access
The loaded address is not decided until loading phase (load time relocation). The solution by ELF is to build up a pointer array which will point to these data, and the pointer array is called GOT. When process needs to access to data in another module, it would go to GOT to find the address of data which would be filled up by linker in loading process. Also, due to the fact that GOT is put in data section; therefore, process has the privilege to modify it in loading time (From hacker’s view, we can also try modifying it for some bad things). Take a look at the point 2, we can get the relative offset for inner-module data access in compilation phase, which means that we can use the same way to access to GOT in compilation phase. Next, the position of data in GOT is also fixed, so it is easy to find the address of data now. What’s more important, the data defined in another module would have a copy in virtual address spaces of each process. Therefore, we can gurantee that shared data would not be change among processes.
- Inter-module call or jmp
Same way as Inter-module data access, the function address would be filled up by dynamic linker in loading process.(In fact, it is not same. The details about lazy binding would be mentioned in later)
Ok! Back to the topic of relocation. What’s the difference between relocation of static linking and dynamic linking?
- Static linking
Relocation is done at link time.
- Dynamic linking
Relocation is done at load time.
Now, I am curious about the relocation table.
- Static linking
- for text section -
- for data section -
- for text section -
- Dynamic linking
- for data section and
- for resolving function (
- for data section and
Here comes a classic example for a glibc library
There are three types of relocation entries:
Take away, PIC makes object could be shared by multiple processes. -shared makes load time relocation.
PIC can only make text section position independent and shared by multiple processes instead of data section. Therefore, data would have a copy in all processes space.
Lazy binding with PLT
Following ideas above, we need to resolve a large number of symbols and relocation before running up program for each time, and that would make performance of dynamic linking worse. To optimize it, we apply lazy binding.
The motivation is that most of the inter-module functions would even not be used during the process. It would not be necessary for us to resolve it if we don’t need it. Lazy binding would not resolve it until calling it for the first time.
To implement it, we need PLT (Procedure Linkage Table). First, we should think about what kind of information is needed if dynamic linker wants to resolve the inter-module functions. It must needs to know the target module and function. Therefore, the function
_dl_runtime_resolve() in glibc which helps us to resolve functions needs parameters of module-ID and the index of function in
.rel.plt. Here comes a workflow of resolving inter-module functions.
Follow the steps when we want to call inter-module
- jmp to the address saved in
- The address is filled with the location of next instruction, which is
- jmp to
- jmp to
- fill the address of
Next time, when we call
bar@plt and jump to
bar@GOT again. The address filled in
bar@GOT is already the address of
bar(). Following is the disassembly of section
You might be confused that why there is also an entry of
<puts@plt-0x10>. In fact, this entry is the last two instructions of each
- jmp to
These two instructions would be same for each
xx@plt. To reduce the duplicate code, we can put this code to a common entry which is
<puts@plt-0x10>. Take a look at
<puts@plt>, the last one
jmp 400550 <_init+0x20> means to jmp to common entry.
For more detail of
_dl_runtime_resolve, I would like to recommend a chinese source story. It would trace into the
Take away, _dl_runtime_resolve is the man who helps you resolve the unresolved function at runtime.
Who is our dynamic linker?
Dynamic linker is not assigned by system settings or environment variables, but decided by ELF itself! There is a section called
.interp which would save the string of the dynamic linker’s path. Use the command
objdump -s ELF, then we can find the result.
Also with the command of
readelf -l ELF | grep interpreter, we can also find the result.
An important section to save information for dynamic linker
The important section is
.dynamic. With command of
readelf -d ELF, we can get the content of
In the picture,
SYMTAB shows the address of symbol table;
JMPREL shows the address of
Some more stories about dynamic linker
In prev article, I already talked about how ELF is loaded and runs up. After kernel loads ELF, it will give privilege to the entry point of ELF (which is the point referred by
e_entry). However, it is a little different from the work of dynamic linking. In the case of dynamic linking, the privilege should be given to dynamic linker first (which is the point referred by
.interp). This means that if there is no
.interp, the entry point of ELF is
e_entry; if yes, then
.interp as the entry point.
Now, let’s take a look at dynamic linker (
/lib/ld-2.23.so on my system) itself. It is a shared object, but also an ELF which is executable! YOU even can try to run it up. The reason is that
execve() doesn’t care about whether the program is executable or not. No matter what, it will try to find
Here comes another interesting questions. If dynamic linked ELF needs to load dynamic linker first before it runs up, then dynamic linker itself should be dynamic linked or static linked. The answer is static linked. No one could help dynamic linker to solve its dependencies. YOU can also check it by using
ldd on dynamic linker of your system.