Review
We spoke about prepocessing being a text substituion process. The next step of the compilation process is called compiling the code. This is actually a two step process. The compilation program first turns the text into assembly instructions, that are appropriate for the target CPU. x86 has a different instruction set then arm and it may be a 32 or 64 bit system. The compiler has to take this all into consideration. You can see the output of this stage by passing the -S flag to gcc.
After the text file is turned to assembly it then needs to be turned into object code. Object code is the actual machine code, the bit pattern that the CPU can process. If you pass the -c flag to gcc you will get a .o file. You can view the contents of this file using objdump -d "name of file"
The final stage is linking the files. The output of the compilation phase are object files. that is files with a .o extension. In large programs it is common to have many src files that reference functions in other src files. Each src file is considered a "tranlation unit". Each Tranlation unit get compiled into an object file.
Every function call is located at a certain memory address. Combininng all the individual object files into a single shippable executable program and resolving the addresses for the function calls that are in different .o files is called linking.
The compiler we are going to use is called gcc. gcc takes care of all of the compilation steps with a single one line command. I have linked a summary to using gcc on the main class page. You can alway type "man gcc" or "info gcc" to get a full tutorial and manual on how to use it.
The general form of the gcc command is
gcc -g -Wall -Werror name_of_source_files.c file_2.c file_3.c -o name_of_executable -lLibarires_you_need_to_link
You can make a file e.g. flags.txt with all your flags and source files and use the @symbol to pass them to the command: Note each src file passed to gcc is considered as it's own translation unit, meaning it get's compiled as an object file separate from the other translation units.
gcc @flags.txt -o output-namegcc flags article
I am providing a pdf of our class textbook and a pdf of the gdb info pages. However, you will be working in a shell environment and I advise you to install the info pages in the shell for gdb and gcc. I also advise you to read about how to navigate the info pages within the info pages, by typing info info at the command prompt
The gdb info manualA debugger is a program, a tool that allows us to inspect the memory of our program while it is running. We can look at registers, variables, instructions, and data that is in the memory or oxur running programs. We can read the contents of any memory location within a programs ADDRESS SPACE. We can step through and read each instruction as it happens, and see what memory/registers are being affected and what the change in bit pattern is. We can view the disassmebled code to understand what our program is doing.
The gdb info manualModern Computers use Operating System software to starts and stop user programs. When you ask the operating system to run your program, It has to know the address of the first instruction for your program. In a running program, A cpu just reads instructions one after another starting at certain address. How does the OS know where to start your code? It looks for the address with an associated symbol declared as main
In this example I try to compile a file called entry_point.c that has no main function. gcc give me a linker error, because a main function is required. Linker errors usually come about when the linker cannot find code that is required by your program.
In the following images we run gdb the debugger on our example program a.out with the command. This program does nothing but places the bit pattern that represents 5 at the address that the symbol x represents. In this demo we are just trying to show that main has an address.
$ gdb ./a.out
So the linker specifically sets up an executable file to be read by a specific Operating System so that the operating system know how to load the data and code from the program into memory and also so the Operating system knows how to find the starting address for the first instruction. We as programmers communicate where the code should start to the linker by creating the function called main. The linker actually sets up an entry point called _start, where the cpu starts running and in that initial code a function call to main occurs
In programming we issue instructions to a cpu. We usually put a sequence of instructions together one after the other to change bit patterns in memory in a way that is useful to us.
Sometimes we would like to execute some sequence of instructions many times from different places in a program. It would be helpful to put those sequence of instructions togther in a reusable container that can be initiated from anywhere in the program.
A function, in programming, is a container that holds a sequence of instructions that can be initiated from anywhere in the program. A function is a reusable container.
In actuallity a function name is a symbol that represents a memory address where a sequence of instructions reside When a function is called there is a jump to that memory address. It is a start address in memory with a certain number of instructions following it. Lets create a simple function.
lets demonstrate with this simple program
In GDB we use the print command to print the value held in the program counter $pc register as an address
the program counter steps thru the instructions one at a time. What happens when it gets to the callq instruction.
This a function definition. Let's examine it to understand it's anatomy.
// function defintion
void foo(void)
{
}
the first word below, void in this case, specifes to the compiler wether there will be a return value or not. and if there is a return value, what TYPE (and how big) it is. The word void specifically means this function does not return a value
void foo()
{
}
In actuality a return value is a value that is stored in a cpu register. On intel x86 architecture the register where the return value is stored is called $EAX or $RAX depending on the size of the type returned.
Registers are the fastest form of memory known to mankind. They are located phyiscally on the CPU in each core. In modern cpu's the register are 8 bytes wide that is 64 bits. So far I have mentioned $pc the program counter, which on an x86 cpu is called $RIP, but in gdb you can reference as $pc (program counter). The $rip registers stores the address in memory where the next instruction to be executed is stored. Now I am discussing the $rax register where function store RETURN values right before they exit.
here are some of the registers and the values stored there in hex and base10
Lets change foo so that it returns an TYPE called int. If we view the disassembly in gdb notice what happens right before we return from the function
int foo() { int y = 4; return 255; }
Now look what happens in main when we return from foo. The return value was stored in $eax so it is copied from $eax to the memory location specified by the symbol r. r happens to be located at the memory address specified by the $rbp register-4 bytes.
void main()
{
int r = foo();
}
The next part is of the defintion is the name of the function. In this case foo, which is a symbol for the starting address in memory where a sequence of cpu instructions begin.
// function defintion
void foo(void)
{
}
The part of the function definition in between the parentheses are things(values,bit patterns) that we want to send to the function in order for it to do it's work.. In this case because we are using the word void, Which means nothing we will send nothing to the function.
void foo(void)
{
}
Prior to function being called the Arguemntes are copied to some registers by the calling function. The called function then copies the values from the registers to it's local stack frame. we will soon talk about the stack frame
In many computer architectures, particularly x86-64, there's a specific order and set of registers used for passing arguments to functions.
lets see how the arguements are passed under the hood for this code.
int foo(int a, int b, int c, int d)
{
int y = a + b + c + d;
return y;
}
void main(void)
{
int p = foo(1 ,2, 3,4);
}
Combined are called the FUNCTION SIGNATURE or the FUNCTION PROTOTYPE
The signature or protoype defines the cirumstances of the function, how is it called, what does it take, and what does it return
void foo(void) //Function prototype or signature
{
}
after the function signature/prototyp, we have the function body. The part in between the curly brackets is the function body. When the function is called the code that is in the body is what will be executed on the cpu
void foo(void) //Function prototype or signature
{
//Function Body in between the curly braces
}
A function definition has a signature(aka prototype) and a body, as shown in the previous slide. If you just have the signature/prototype by itself and put a semi colon at the end. It is called a function declaration.
void foo(void); //Function declaration.
C/C++ allow you to complete the compilation step of your code. As long as a function declaration is present prior to any function calls that you make to that function. This is called a FORWARD DECLARATION.
HOWEVER, if you never provide a full function definition the linker step will fail when it tries to resolve the symbols it is looking for in the function call.
When you include a standard header file like math.h You are including forward declarations for any function from that library that you want to use. The linker step will look for the complete function definition when it tried to resolve the function call. if it can't find it, you will get a linker error.
// function declaration
//function defintion
void foo(void) //Function prototype or signature
{
//Function Body in between the curly braces
}
All statements in c end in a semicolon, white space is disregarded. The semi colon tells the compiler this is the end of the statement. It is completely ok to make a statement like this. It is not adviseable but will compile. White space only effects symbol names.
int x,
y
,
p =
3;