Saturday, June 7, 2014

Beginning System Programming. (Compilers And Segments)

। जय श्री भगवान् ।
To begin with system programming one must understand the compiler and the architecture he/she is using. With application programming we can perhaps forget if caches are being fully utilized or if the data we've is the correct one. So in this post I'll talk about how to look at things from the point of view of compiler since understanding the tool that actually does most of the hard work to make code run on bare metal is essential.


A Simple Hello World Dis-assembly

Let's take an example for this. In this case we are not even going to print the hello world just however we would like to see where the variables are located in our program. The basic idea is to understand the memory segments which a program get when it starts to run. Each of these segments or sections is generally described by a section in the executable except the stack segment since it's actually not part of the executable. To give an example let's write a simple function as shown below,

 
int myfunc()
{
   int a = 0;
   a = a ^ (~a);
   return !a;
}  

The dis-assembly of the above function is shown below as reported by objdump -D

Disassembly of section .text:

00000000 <myfunc>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 10                sub    $0x10,%esp
   6:   c7 45 fc 05 00 00 00    movl   $0x5,-0x4(%ebp)
   d:   c7 45 fc ff ff ff ff    movl   $0xffffffff,-0x4(%ebp)
  14:   83 7d fc 00             cmpl   $0x0,-0x4(%ebp)
  18:   0f 94 c0                sete   %al
  1b:   0f b6 c0                movzbl %al,%eax
  1e:   c9                      leave  
  1f:   c3                      ret    

First note that the function is located in the section .text of the executable. This is where all the code is and this is usually marked as read-only section since code isn't allowed to change while executing. The second important point is the variable we declared inside the function. See closely that there's no mention of the name of variable within the function that's because the variable is created on stack (See how the esp is moved by 16 bytes but only uses 4 bytes on 32 bit machine). The compiler assumes that the stack pointer is always valid and uses the current value of esp to calculate how much it needs to move in order to make room for the variable.

All such auto storage class variables are created by moving the stack pointer down(or up depends how stack grows on x86 and x86_64 it grows down). This is one reason that big structures are usually passed as pointers and not as the structures themselves so as to avoid a huge stack space wastage and memory copy operations. Now let's see what happens to data which is declared global,

 
Disassembly of section .data:

00000000 <my_global_var>:
   0:   01 00                   add    %eax,(%eax)
        ...

I created a variable named my_global_var and it goes into data section. This section is actually occupying space on disk (as an instruction to initialize the variable). When the executable is loaded the loader would allocate space in memory while parsing through the sections. Therefore this memory is not allocated or destroyed as we saw in case of auto storage class variables as above.

There maybe several other sections in the compiled binary which you can find out using readelf however not all sections are required to be loaded. Some sections are there for information purposes only. There are sections like ro-data where the read only data is stored like constant strings or variables declared as constants. Try using readelf command to see the sections as shown below,

pranay@linux-y7pi:~/pks_modules/test> readelf -S test.o 
There are 13 section headers, starting at offset 0x184:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 00003f 00  AX  0   0  4
  [ 2] .rel.text         REL             00000000 000488 000010 08     11   1  4
  [ 3] .data             PROGBITS        00000000 000074 000004 00  WA  0   0  4
  [ 4] .bss              NOBITS          00000000 000078 000000 00  WA  0   0  4
  [ 5] .rodata           PROGBITS        00000000 000078 00000e 00   A  0   0  1
  [ 6] .comment          PROGBITS        00000000 000086 000043 01  MS  0   0  1
  [ 7] .note.GNU-stack   PROGBITS        00000000 0000c9 000000 00      0   0  1
  [ 8] .eh_frame         PROGBITS        00000000 0000cc 000058 00   A  0   0  4
  [ 9] .rel.eh_frame     REL             00000000 000498 000010 08     11   8  4
  [10] .shstrtab         STRTAB          00000000 000124 00005f 00      0   0  1
  [11] .symtab           SYMTAB          00000000 00038c 0000d0 10     12   9  4
  [12] .strtab           STRTAB          00000000 00045c 000029 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

The sections having A flag shown means those sections require allocation to be done. Not that there's no stack section since that will be allocated by the OS when the executable is loaded.

One interesting sections is rel.text and it's very useful while loading the executable. As you can see none of the section have any particular Addr value. The Addr value is actually the start of the section however since loader will decide where a section has to be allocated these are not filled in yet by the compiler and everything is done relative to address 0.

Now the problem with this is that while loading there needs to be a fixup. There needs to be a fixup of function calls, data access instructions etc. There maybe also a rel.data for data however in our case it's only rel.text. This section is not allocated as you can see, but is used by the loader to fixup the function call addresses or any instruction that uses memory after the sections have been allocated memory.
 

Exercise 0.1

A simple exercise would be to allocate a static variable inside a function and see what happens to that variable. Then create another static global variable by the same name and see what happens to that variable. Which section does it go to? Is the name of variable same as what you put in the code? 

Exercise 0.2

Try to force a variable in read only section without using the const keyword. Hint see how to use __attribute__(section(......)) when using gcc. See if you can create a new section with your own name and put the variable there instead.
 
 

No comments :

Post a Comment

Thanks for commenting!