C Program Compilation & Execution

The C’s program building process involves four stages:

Preprocessing is the first pass of any C compilation. It processes include-files, conditional compilation instructions and macros.
Compilation is the second pass. It takes the output of the preprocessor, and the source code, and generates assembler source code.
Assembly is the third stage of compilation. It takes the assembly source code and produces an assembly listing with offsets. The assembler output is stored in an object file.
Linking is the final stage of compilation. It takes one or more object files or libraries as input and combines them to produce a single (usually executable) file. In doing so, it resolves references to external symbols, assigns final addresses to procedures/functions and variables, and revises code and data to reflect new addresses (a process called relocation).

OBJECT FILES and EXECUTABLE

After the source code has been assembled, it will produce Object files (e.g. .o, .obj) and then linked, producing an executable file.
An object and executable come in several formats such as ELF (Executable and Linking Format) and COFF (Common Object-File Format). For example, ELF is used on Linux systems, while COFF is used on Windows systems.

When we examine the content of these object files there are areas called sections. Sections can hold executable code, data, dynamic linking information, debugging data, symbol tables, relocation information, comments, string tables, and notes.
Some sections are loaded into the process image and some provide information needed in the building of a process image while still others are used only in linking object files.
- The Executable and Linking Format used by GNU/Linux and other operating systems, defines a number of ‘sections’ in an executable program.
- These sections are used to provide instruction to the binary file and allowing inspection. Important function sections include the Global Offset Table (GOT), which stores addresses of system functions, the Procedure Linking Table (PLT), which stores indirect links to the GOT, .init/.fini, for internal initialization and shutdown, .ctors/.dtors, for constructors and destructors.
- The data sections are .rodata, for read only data, .data for initialized data, and .bss for uninitialized data.
- Partial list of the ELF sections are organized as follows (from low to high):
1. .init - Startup
2. .text - String
3. .fini - Shutdown
4. .rodata - Read Only
5. .data - Initialized Data
6. .tdata - Initialized Thread Data
7. .tbss - Uninitialized Thread Data
8. .ctors - Constructors
9. .dtors - Destructors
10. .got - Global Offset Table
11. .bss - Uninitialized Data

RELOCATION RECORDS

Because the various object files will include references to each others code and/or data, so various locations, these shall need to be combined during the link time.

After linking all of the object files together, the linker uses the relocation records to find all of the addresses that need to be filled in.

SYMBOL TABLE

Since assembling to machine code removes all traces of labels from the code, the object file format has to keep these around in different places.

It is accomplished by the symbol table that contains a list of names and their corresponding offsets in the text and data segments.

LINKING

The linker actually enables separate compilation. As shown in Figure w.3, an executable can be made up of a number of source files which can be compiled and assembled into their object files respectively, independently.

SHARED OBJECTS

Each program relies on a number of functions, some of which will be standard C library functions, like printf(), malloc(), strcpy(), etc.

Since the C library is common, it is better to have each program reference the common, one instance of that library, instead of having each program contain a copy of the library.
This is implemented during the linking process where some of the objects are linked during the link time whereas some done during the runtime (deferred/dynamic linking).

STATICALLY LINKED

The term ‘statically linked’ means that the program and the particular library that it’s linked against are combined together by the linker at link time.
This means that the binding between the program and the particular library is fixed and known at link time before the program run. It also means that we can't change this binding unless we re-link the program with a new version of the library.
Programs that are linked statically are linked against archives of objects (libraries) that typically have the extension of .a.
The drawback of this technique is that the executable is quite big in size, all the needed information need to be brought together

DYNAMICALLY LINKED

The term ‘dynamically linked’ means that the program and the particular library it references are not combined together by the linker at link time.
Instead, the linker places information into the executable that tells the loader which shared object module the code is in and which runtime linker should be used to find and bind the references.
This means that the binding between the program and the shared object is done at runtime that is before the program starts, the appropriate shared objects are found and bound.

PROCESS LOADING

In Linux processes loaded from a file system (using either the execve() or spawn() system calls) are in ELF format.
If the file system is on a block-oriented device, the code and data are loaded into main memory.
If the file system is memory mapped (e.g. ROM/Flash image), the code needn't be loaded into RAM, but may be executed in place.
This approach makes all RAM available for data and stack, leaving the code in ROM or Flash. In all cases, if the same process is loaded more than once, its code will be shared.
Before we can run an executable, firstly we have to load it into memory.
This is done by the loader, which is generally part of the operating system. The loader does the following things (from other things):

Memory and access validation - Firstly, the OS system kernel reads in the program file’s header information and does the validation for type, access permissions, memory requirement and its ability to run its instructions. It confirms that file is an executable image and calculates memory requirements.

Process setup includes:

Allocates primary memory for the program's execution.

Copies address space from secondary to primary memory.

Copies the .text and .data sections from the executable into primary memory.

Copies program arguments (e.g., command line arguments) onto the stack.

Initializes registers: sets the esp (stack pointer) to point to top of stack, clears the rest.

Jumps to start routine, which: copies main()'s arguments off of the stack, and jumps to main().

Address space is memory space that contains program code, stack, and data segments or in other words, all data the program uses as it runs.
The memory layout, consists of three segments (text, data, and stack), in simplified form is shown in Figure w.5.
The dynamic data segment is also referred to as the heap, the place dynamically allocated memory (such as from malloc() and new) comes from. Dynamically allocated memory is memory allocated at runtime instead of compile/link time.
This organization enables any division of the dynamically allocated memory between the heap (explicitly) and the stack (implicitly). This explains why the stack grows downward and heap grows upward.

RUNTIME DATA STRUCTURE

A process is a running program. Typically a process has 5 different areas of memory allocated to it.

THE PROCESS (IMAGE)

The diagram below shows the memory layout of a typical C’s process. The process load segments (corresponding to "text" and "data" in the diagram) at the process's base address.
The main stack is located just below and grows downwards. Any additional threads or function calls that are created will have their own stacks, located below the main stack.
Each of the stack frames is separated by a guard page to detect stack overflows among stacks frame. The heap is located above the process and grows upwards.
In the middle of the process's address space, there is a region is reserved for shared objects. When a new process is created, the process manager first maps the two segments from the executable into memory.
It then decodes the program's ELF header. If the program header indicates that the executable was linked against a shared library, the process manager will extract the name of the dynamic interpreter from the program header.
The dynamic interpreter points to a shared library that contains the runtime linker code. The process manager will load this shared library in memory and will then pass control to the runtime linker code in this library.

RUNTIME LINKER AND SHARED LIBRARY LOADING

The runtime linker is invoked when a program that was linked against a shared object is started or when a program requests that a shared object be dynamically loaded.

The runtime linker is contained within the C runtime library. The runtime linker performs several tasks when loading a shared library (.so file).
It also gives information about the relocations that need to be applied and the external symbols that need to be resolved. The runtime linker will first load any other required shared libraries (which may themselves reference other shared libraries).
It will then process the relocations for each library. Some of these relocations are local to the library, while others require the runtime linker to resolve a global symbol.

DYNAMIC ADDRESS TRANSLATION

Each process can use addresses starting at 0, even if other processes are running, or even if the same program is running more than one time. Address spaces are protected.
Can fool process further into thinking it has a memory that's much larger than available physical memory (virtual memory).
The address translation was normally done by Memory Management Unit (MMU) that incorporated in the processor itself.
Virtual addresses are relative to the process. Each process believes that its virtual addresses start from 0. The process does not even know where it is located in physical memory; the code executes entirely in terms of virtual addresses.
MMU can refuse to translate virtual addresses that are outside the range of memory for the process for example by generating the segmentation faults. This provides the protection for each process.
During translation, one can even move parts of the address space of a process between disk and memory as needed (normally called swapping or paging).

STACK AND HEAP

The stack is where memory is allocated for automatic variables within functions. A stack is a Last In First Out (LIFO) storage where new storage is allocated and de-allocated at only one end, called the top of the stack. Every function call will create a stack (normally called stack frame) and when the function exit, the stack frame will be destroyed.

when a program begins execution in the function main(), stack frame is created, space is allocated on the stack for all variables declared within main().
Then, when main() calls a function, a(), new stack frame is created for the variables in a() at the top of the main() stack. Any parameters passed by main() to a() are stored on this stack.
If a() were to call any additional functions such as b() and c(), new stack frames would be allocated at the new top of the stack. Notice that the order of the execution happened in the sequence.
When c(), b() and a() return, storage for their local variables are de-allocated, the stack frames are destroyed and the top of the stack returns to the previous condition. The order of the execution is in the reverse.
int * intMaker() { int a = 3; // (1)

return &a; // (2)

}
int main() {

int *b = intMaker(); // (3) return 0;

}

intMaker returns a pointer to an object on the stack (1), which will disappear together with the rest of the state of the function call when
intMaker returns (2). the behavior is undefined

The main Function and Program Execution

The main() function serves as the starting point for program execution. A program usually stops executing at the end of main().

Parameters can be declared to main() so that it can receive arguments from the command line. There is no prototype declared for main(), and as a conclusion, we have main() that can be defined with zero, two, or three parameters.

When you want to pass information to the main() function, the parameters are traditionally named argc and argv, although for C compiler does not require these names. The types for argc and argv are defined by the C language.

print object file's content

objdump -S main.o

http://www.tenouk.com/ModuleW.html

http://www.tenouk.com/ModuleY.html

http://www.tenouk.com/ModuleZ.html

http://www.tenouk.com/Bufferoverflowc/Bufferoverflow1c.html

http://www.tenouk.com/Bufferoverflowc/Bufferoverflow2.html
http://www.tenouk.com/Bufferoverflowc/Bufferoverflow2a.html
http://www.tenouk.com/Bufferoverflowc/Bufferoverflow4.html

Search This Blog

Conceptual Studies on Tech

Compilation & Execution

OBJECT FILES and EXECUTABLE

SHARED OBJECTS

STATICALLY LINKED

DYNAMICALLY LINKED

PROCESS LOADING

RUNTIME DATA STRUCTURE

THE PROCESS (IMAGE)

RUNTIME LINKER AND SHARED LIBRARY LOADING

DYNAMIC ADDRESS TRANSLATION

STACK AND HEAP

The main Function and Program Execution

Comments

Post a Comment

Popular posts from this blog

Thread & Locks

Opengl-es Buffer

Opengl Stages of Vertex Transformation