How to load ELF into memory?
In this article, the loading process described is limited to ELF in Linux, not including how PE is loaded in Window. But don’t worry, many concepts (in addition to terminology) are similar in Window.
Why not just map to the physical memory address?
Why is program always loaded into virtual memory address? We already know that program would run up as a process, and each process tends to take all the memory alone by themselves. Long time ago, process runs up directly on a physical memory, which means that process would be mapped to the physical memory address. However, there are several risks for this method: 1. The address spaces of all processes are not separated from each other, which means that malicious process can visit the address space of others, 2. It’s not efficient to map whole program to the memory, 3. Program can only access data at fixed address; however, the allocated memory address are not fixed.
Previous methods to map to physical memory address: fix partitioning and dynamic partitioning. For fix partitioning, each partition would have fixed size (might be unequal to each others). However, it would cause to internal fragmentation due to the fixed size. For dynamic partitioning, each partition has size fitting to the process. However, it would have external fragmentation problem, and need strategy to pick priority partition even using the compaction method.
The method to solve the problem is to make process visit virtual memory address, and map virtual memory address to physical address. Imagine that non-consecutive physical address can still be consecutive in virtual address space, which means that process doesn’t need to care about the non-consecutive problem in physical address space! Each process can only access to their virtual address space (which solves problem 1). For the methods to map from virtual address space to physical address space, I would like to introduce segmentation in past and paging which is used widely now.
Segmentation is the method of mapping virtual address space to physical address space bytes by bytes (byte to byte mapping). For example, process A needs 10 MB for memory, then assume that A needs (
0x00000000 ~ 0x00A00000) in the virtual address space. In physical address space, memory of 10 MB would be allocated and mapped by virtual address space. For example,
The memory accessed in virtual address space and physical address space have the same size. Segmentation solves the first (A and B map to different physical address space) and the third issue (no matter which physical address space is allocated, just follow the rule of mapping from
0x00100000). However, mapping whole process into memory is still not efficient. To solve this problem, we need to know about the principle of locality for program first: Process would only visit part of data and program frequently, which means that we can use a method of more fine-grained level to achieve higher efficiency. This method is called paging.
Paging would separate address space to pages. Break up all memory into small fixed-size (fix partitioning) partitions called frames. Each process would be given a number of pages, and each frame can only hold one page. The size of page is decided by the hardware. The data and code which is accessed frequently would be loaded into memory, and the remaining part would be kept in disk. With paging, the memory wasted due to internal fragmentation would never be more than
(page_size - 1) bytes. Here comes an example, we assume that process A uses the virtual address space (
VAS_A), process B uses the virtual address space (
VAS_B can be separated to pages of (
VP_A1, VP_A2, ...VP_A8) and (
VP_B1, VP_B2, ... VP_B8), there are also physical pages of (
PP1, PP2, ...PP6) and disk pages of (
VP_A5 are kept in disk first. When process A needs to access these two pages, hardware will catch Page Fault and OS would read them from disk and load into virtual address space. The details of page fault would be described in next section.
VP_A3 of process A and
VP_B1 of process B are mapped to physical memory by pages, and
VP_B7 share the memory! Mapping by pages solves the second problem and also makes it convenient to build up protection. Each page can be set with different privileges and limit processes to access.
To summarize the mapping from virtual address space to physical address space, almost all CPU needs a compoment - Memory Management Unit (MMU) to help handle with it. This work is called dynamic address translation.
Take a look at the picture above. After we swap out the process B, we can still put D in non-consecutive physical address. Try using dynamic address translation (
page_table[frame number] + offset) on consecutive virtual address, then we can find that they are non-consecutive addresses actually.
Take away, this section talks about why using paging rather than segmentation.
How to map by pages?
Continuing on the previous section, how process is built up, loaded and run?
- build up an individual virtual address space.
- read ELF header and build up mapping between program and virtual address space.
- set CPU register as entry point of program and run.
For the step 1, it doesn’t create a space actually (of course, it is virtual) but just build up a data structure for mapping from virtual address space to physical address space.
For the step 2, it would map from ELF to virtual address space. For example, if size of page is 4096 bytes, then each mapping would take 4096 bytes.
Mapping by page, it would take from
0x08048000 ~ 0x08049000 in virtual address space just like above. This mapping work is also just a data structure which can also help search the location of fault page in ELF.
For the step 3, OS would let process control the execution.
What’s page fault?
Three steps in above have not loaded instructions and data into memory, only data structure of mapping is built. Assume that the entry point address of program is
0x08048000, which is the start address of
.text. When CPU plans to execute the instruction in this address, he finds that this page (
0x08048000 - 0x08049000) is an empty page, then Page fault would be caught and pass the control to OS. The exception handler in OS would go to search in data structure built in step 2 above, which maps ELF to virtual address space. The data structure will tell OS that the empty page is mapped to which part of ELF, and allocate a physical page in physical memory space. Give control back to the process, then process would continue on the address which has page fault. Therefore, with process executed, page fault would happen again and again.
Briefly work like following steps:
- Execute pages by pages
- Asking for page which is not hold in memory
- There is no any frame holds the page
- Call page fault
- Loading desired page from physical to frame in virtual memory
Take away, this section talks about mapping with page fault.
Section or Segment in memory?
If we map each section to one page, it should be annoying waste if binary has several sections. Therefore, ELF is mapped into page by segments instead of sections. What’s segment? Segment is the combination of sections in memory for purpose of reducing scattered pages.
There is section header in ELF which we can use
-S option to search for sections. There is also program header in ELF which we can use
-l option to search for Section to Segment mapping. However, we still can find that the size of segment is not multiple of a page, which means that page would still be scattered. For example, if there are three segments (
seg3) for ELF, and
seg2 even needs three pages to support. If each segment is mapped to page individually, we need five pages of physical memory. To reduce the number of pages needed, the page which include multiple segments will be mapped for two times in virtual memory. For example,
With picture above, we can imagine that if we can put all segments together then only three pages are enough. Yes, then it is close to the solution to handle the wasteful issue. First, what is the difficulty if we only use three pages to separate all segments?
It seems that we still can locate the start address of each segment if we load into continuous virtual address space. However, if we want to swap out
page 1 after loading
seg1, then part of
seg2 would also be swapped out. Therefore, I want to avoid starting address of each segments to be in same page. If we map the page including mixed segments for two times,
Now, all starting addresses of segments are in different pages. For
seg2, we can load continuous memory through three pages. Here, we know that the starting address is not necessary to be multiple of page.
Take away, this section talks about how segments are loaded into memory The problem is simple in Window, starting addresses of all segments would be multiple of pages. Don’t need to worry about the waste, PE won’t have so many sections like ELF does.
Stack and Heap in memory
In addition to segments of ELF, stack and heap would also be allocated in virtual address space. To see the virtual address space allocated for process,
In addition to virtual space allocated for segments, we can also find
[vdso]. These three virtual spaces are not mapped to any ELF, and they are also called as Anonymous Virtual Memory Area.
How Linux load ELF?
Back to our big problem, how does Linux load ELF? In bash process, we input a command to run up ELF, and bash process would call
fork() to create a new process. The new process would call
execve() to start loading ELF.
do_execve() will find ELF and extract first 128 bytes to check the format. For example, the first four bytes are called as Magic Number. For example,
search_binary_handle(). From the name, we already can guess that this function would decide loading method based on format. For example, the method of loading ELF is
load_script() for script based executables.
load_elf_binary() is defined in /fs/binfmt_elf.c. The major steps are:
- simple check of binary format
.interpsection for dynamic linking
- mapping into memory based on header
- initialize process environment
- modify the return address of syscall as the entry point of ELF
Here are several details which would discussed later in article of dynamic linking. After
load_elf_binary() is done, then return to
do_execve(), return to
sys_execve(), next step would return to entry point of ELF instead of
execve()! Why? The reason is that step 5 already modifies the return address of syscall made by
execve(); therefore, we come to the door of ELF.
What’s the next plan?
Ok, it really takes some time for me to write this note, but it is interesting. It brings me back to the memory when I learned about OS. What’s different is that this time I know why I learn and I know more about ELF. The ELF discussed about in this article is the case of static linked, I would like to explore the case of dynamic linked in next article.