The story about Dynamic Linking

In prev article, we already know how ELF is loaded into memory and runs up, but it is based on the assumption of static linking. How if ELF uses dynamic linking? Let’s identify the difference together through this article.

What’s dynamic linking?

Delaying the linking process to runtime, this is the core idea of dynamic linking.
If program 1 and program 2 both need lib. When we run up program 1, lib will also be mapped into memory. After that, when we run up program 2, lib doesn’t need to be loaded again. Program 2 just needs to link to the lib which already exists in the memory. This can not only save the cost of duplicate memory but also reduce the cost of page switch.
When program is loaded, dynamic linker (instead of ld for static linking) will loads all necessary into the virtual address space of the process, and do symbol resolution and relocation.

In Linux, dynamic linking file has extension as .so; in window, he has extension as .dll.

Take a brief review on some examples

Following the prev article, we know how to create static link libraries and dynamic link libraries. With static linking, compiling the source to .o, and package .o to .a, which is the static link libraries. With dynamic linking, also compiling the source to .o at first, then using -shared option to output .so.
To dig into more details, I would like to introduce an example, program p needs foobar() function in, which would be linked in runtime. When p.c is compiled to p.o, compiler still doesn’t know the address of foobar(). When linker convert p.o to an ELF, linker needs to make sure the attribute of foobar().

  • if foobar() is defined in static linking module
    following the rule of static linking: relocate foobar().
  • if foobar() is defined in dynamic linking module
    linker would label foobar() as a symbol of dynamic linking, delay the relocation work to runtime.

But, how does linker identify foobar is defined in static or dynamic linking module? Linker needs the symbol information from

Load time relocation

It is easy for us to imagine the way by which the static linked ELF is loaded into the virtual address space because there is only one document needed to be loaded. However, for dynamic linking, ELF would be separated to several modules, then how would they all be arranged in the memory?
Take a view of /proc/[pid]/maps. In addition to p itself and, we can also find /lib/ which is linux’s dynamic linker. What’s worth notice is that the loaded address of shared object is still not fixed in compilation phase.
So, back to our topic. How to make sure about the virtual address in runtime? We should start from figuring out the difficulty first? Assume that there are three shared objects: A, B, and C, we allocate (0x1000~0x2000) to A, and (0x2000~0x3000) to B. If a process needs B but not needs A, then (0x1000~0x2000) can be allocated to C. This would cause to the conflict address between A and C, which means that we can never use A and C at the same time in a process.
To avoid the conflict on address, shared objects should not assume their virtual address in compilation phase, while ELF can still guarantee at fixed address (e.g., 0x08040000 for linux or 0x0040000 for windows) because it is the first loaded file. For example, foobar has relative location as 0x100. If the loaded address of the segment is 0x10000000, then address of foobar would be relocated to 0x10000100. This is how load time relocation works compared to the link time relocation by static linking. The -shared option also means that shared objects would use load time relocation.
Beside -shared option, we can also find -fPIC option in line of creating shared objects. What’s its effect? To make the text can be shared by multiple processes, the solution is PIC (Position-Independent Code).

  1. Inner-module call or jmp
    Caller and callee are in the same module, and their relative offset is fixed. Therefore, can be implemented by relative offset, and the offset would be (dest address - next instruction address).
  2. Inner-module data access
    The relative offset of data address to the caller is also fixed. The caller instruction (next instruction) address can be got by in x86 or %rip in x86-64 (Therefore, the address of data is the address of next instruction plus the relative offset).
  3. Inter-module data access
    The loaded address is not decided until loading phase (load time relocation). The solution by ELF is to build up a pointer array which will point to these data, and the pointer array is called GOT. When process needs to access to data in another module, it would go to GOT to find the address of data which would be filled up by linker in loading process. Also, due to the fact that GOT is put in data section; therefore, process has the privilege to modify it in loading time (From hacker’s view, we can also try modifying it for some bad things). Take a look at the point 2, we can get the relative offset for inner-module data access in compilation phase, which means that we can use the same way to access to GOT in compilation phase. Next, the position of data in GOT is also fixed, so it is easy to find the address of data now. What’s more important, the data defined in another module would have a copy in virtual address spaces of each process. Therefore, we can gurantee that shared data would not be change among processes.
  4. Inter-module call or jmp
    Same way as Inter-module data access, the function address would be filled up by dynamic linker in loading process.(In fact, it is not same. The details about lazy binding would be mentioned in later)

Ok! Back to the topic of relocation. What’s the difference between relocation of static linking and dynamic linking?

  • Static linking
    Relocation is done at link time.
  • Dynamic linking
    Relocation is done at load time.

Now, I am curious about the relocation table.

  • Static linking
    • for text section - .rel.text
    • for data section -
  • Dynamic linking
    • for data section and .got - .rel.dyn
    • for resolving function (.got.plt) - .rel.plt

Here comes a classic example for a glibc library
There are three types of relocation entries: R_X86_64_RELATIVE, R_X86_64_GLOB_DAT, and R_X86_64_JUMP_SLOT.

Take away, PIC makes object could be shared by multiple processes. -shared makes load time relocation.
PIC can only make text section position independent and shared by multiple processes instead of data section. Therefore, data would have a copy in all processes space.

Lazy binding with PLT

Following ideas above, we need to resolve a large number of symbols and relocation before running up program for each time, and that would make performance of dynamic linking worse. To optimize it, we apply lazy binding.
The motivation is that most of the inter-module functions would even not be used during the process. It would not be necessary for us to resolve it if we don’t need it. Lazy binding would not resolve it until calling it for the first time.
To implement it, we need PLT (Procedure Linkage Table). First, we should think about what kind of information is needed if dynamic linker wants to resolve the inter-module functions. It must needs to know the target module and function. Therefore, the function _dl_runtime_resolve() in glibc which helps us to resolve functions needs parameters of module-ID and the index of function in .rel.plt. Here comes a workflow of resolving inter-module functions.
Follow the steps when we want to call inter-module bar():

  1. call bar@plt
  2. jmp to the address saved in bar@GOT
  3. The address is filled with the location of next instruction, which is push n
  4. jmp to push n
  5. push moduleID
  6. jmp to _dl_runtime_resolve
  7. fill the address of bar() to bar@GOT

Next time, when we call bar@plt and jump to bar@GOT again. The address filled in bar@GOT is already the address of bar(). Following is the disassembly of section .plt.
You might be confused that why there is also an entry of <puts@plt-0x10>. In fact, this entry is the last two instructions of each plt including:

  1. push moduleID
  2. jmp to _dl_runtime_resolve

These two instructions would be same for each xx@plt. To reduce the duplicate code, we can put this code to a common entry which is <puts@plt-0x10>. Take a look at <puts@plt>, the last one jmp 400550 <_init+0x20> means to jmp to common entry.

For more detail of _dl_runtime_resolve, I would like to recommend a chinese source story. It would trace into the _dl_fixup().

Take away, _dl_runtime_resolve is the man who helps you resolve the unresolved function at runtime.

Who is our dynamic linker?

Dynamic linker is not assigned by system settings or environment variables, but decided by ELF itself! There is a section called .interp which would save the string of the dynamic linker’s path. Use the command objdump -s ELF, then we can find the result.
Also with the command of readelf -l ELF | grep interpreter, we can also find the result.

[Requesting program interpreter: /lib64/]

An important section to save information for dynamic linker

The important section is .dynamic. With command of readelf -d ELF, we can get the content of .dynamic section.
In the picture, SYMTAB shows the address of symbol table; JMPREL shows the address of .rel.plt.

In prev article, I already talked about how ELF is loaded and runs up. After kernel loads ELF, it will give privilege to the entry point of ELF (which is the point referred by e_entry). However, it is a little different from the work of dynamic linking. In the case of dynamic linking, the privilege should be given to dynamic linker first (which is the point referred by .interp). This means that if there is no .interp, the entry point of ELF is e_entry; if yes, then .interp as the entry point.
Now, let’s take a look at dynamic linker (/lib/ on my system) itself. It is a shared object, but also an ELF which is executable! YOU even can try to run it up. The reason is that execve() doesn’t care about whether the program is executable or not. No matter what, it will try to find .interp or e_entry.
Here comes another interesting questions. If dynamic linked ELF needs to load dynamic linker first before it runs up, then dynamic linker itself should be dynamic linked or static linked. The answer is static linked. No one could help dynamic linker to solve its dependencies. YOU can also check it by using ldd on dynamic linker of your system.