Superimposing meaning over bit patterns

Review

  • Bit Patterns/switches are meaningless
  • Plain Text
  • Files
  • Text Editors
  • ASCII
  • Compiltation as translation to Object code
  • The PreProcessor

Compiling the code

We spoke about prepocessing being a text substituion process. The next step of the compilation process is called compiling the code. This is actually a two step process. The compilation program first turns the text into assembly instructions, that are appropriate for the target CPU. x86 has a different instruction set then arm and it may be a 32 or 64 bit system. The compiler has to take this all into consideration. You can see the output of this stage by passing the -S flag to gcc.

Compiling part 2

After the text file is turned to assembly it then needs to be turned into object code. Object code is the actual machine code, the bit pattern that the CPU can process. If you pass the -c flag to gcc you will get a .o file. You can view the contents of this file using objdump -d "name of file"

LINKING

The final stage is linking the files. The output of the compilation phase are object files. that is files with a .o extension. In large programs it is common to have many src files that reference functions in other src files. Each src file is considered a "tranlation unit". Each Tranlation unit get compiled into an object file.

Every function call is located at a certain memory address. Combininng all the individual object files into a single shippable executable program and resolving the addresses for the function calls that are in different .o files is called linking.

gcc and g++

The compiler we are going to use is called gcc. gcc takes care of all of the compilation steps with a single one line command. I have linked a summary to using gcc on the main class page. You can alway type "man gcc" or "info gcc" to get a full tutorial and manual on how to use it.

gcc commands and options

The general form of the gcc command is

gcc -g -Wall -Werror name_of_source_files.c file_2.c file_3.c
               -o name_of_executable -lLibarires_you_need_to_link

You can make a file e.g. flags.txt with all your flags and source files and use the @symbol to pass them to the command: Note each src file passed to gcc is considered as it's own translation unit, meaning it get's compiled as an object file separate from the other translation units.

			gcc @flags.txt -o output-name
	  
gcc flags article

Our goals

  • What assembly code our c/c++ code compiles to
  • What is actually happening when we call a function
  • What a return value is? Where is it stored
  • How arguments are passed to functions
  • How to use GDB to inspect our running programs
  • To understand what the different areas of memory are and how to access them
  • to complete every exercise in the K&R book

our goals continued..

  • Understand memory and pointers
  • to Understand the compilation and linking process
  • to Be able to read and understand other peoples code.
  • to learn valgrind

info pages and man pages

I am providing a pdf of our class textbook and a pdf of the gdb info pages. However, you will be working in a shell environment and I advise you to install the info pages in the shell for gdb and gcc. I also advise you to read about how to navigate the info pages within the info pages, by typing info info at the command prompt

The gdb info manual

Your responabilities

  • attend the lectures and take notes. In each lecture I will introduce you to some material you should take notes.
  • read the relevant chapters from the book that I assign these will be related to my lectures
  • DO THE HOMEWORK PROJECTS, DO NOT CHEAT by using chat gpt to write it for you. this is where you solidfiy your knowledge.
  • Spend lots of time programming and inspecting memory.

What is a debugger

A debugger is a program, a tool that allows us to inspect the memory of our program while it is running. We can look at registers, variables, instructions, and data that is in the memory or oxur running programs. We can read the contents of any memory location within a programs ADDRESS SPACE. We can step through and read each instruction as it happens, and see what memory/registers are being affected and what the change in bit pattern is. We can view the disassmebled code to understand what our program is doing.

The gdb info manual

The entry point to a program

Modern Computers use Operating System software to starts and stop user programs. When you ask the operating system to run your program, It has to know the address of the first instruction for your program. In a running program, A cpu just reads instructions one after another starting at certain address. How does the OS know where to start your code? It looks for the address with an associated symbol declared as main

In this example I try to compile a file called entry_point.c that has no main function. gcc give me a linker error, because a main function is required. Linker errors usually come about when the linker cannot find code that is required by your program.

The address of main

In the following images we run gdb the debugger on our example program a.out with the command. This program does nothing but places the bit pattern that represents 5 at the address that the symbol x represents. In this demo we are just trying to show that main has an address.

		$ gdb ./a.out
	  

gdb - start; disassemble; info

programmer to Linker to OS

So the linker specifically sets up an executable file to be read by a specific Operating System so that the operating system know how to load the data and code from the program into memory and also so the Operating system knows how to find the starting address for the first instruction. We as programmers communicate where the code should start to the linker by creating the function called main. The linker actually sets up an entry point called _start, where the cpu starts running and in that initial code a function call to main occurs

Reusable containers

In programming we issue instructions to a cpu. We usually put a sequence of instructions together one after the other to change bit patterns in memory in a way that is useful to us.

Sometimes we would like to execute some sequence of instructions many times from different places in a program. It would be helpful to put those sequence of instructions togther in a reusable container that can be initiated from anywhere in the program.

functions

A function, in programming, is a container that holds a sequence of instructions that can be initiated from anywhere in the program. A function is a reusable container.

Program Counter and functions

In actuallity a function name is a symbol that represents a memory address where a sequence of instructions reside When a function is called there is a jump to that memory address. It is a start address in memory with a certain number of instructions following it. Lets create a simple function.

lets demonstrate with this simple program

the program counter

In GDB we use the print command to print the value held in the program counter $pc register as an address

the program counter steps thru the instructions one at a time. What happens when it gets to the callq instruction.

Anatomy of a function

This a function definition. Let's examine it to understand it's anatomy.


		// function defintion
		void foo(void)
		{
		}
	  

return value

the first word below, void in this case, specifes to the compiler wether there will be a return value or not. and if there is a return value, what TYPE (and how big) it is. The word void specifically means this function does not return a value


		void foo()
		{
		}
	  

register $RAX

In actuality a return value is a value that is stored in a cpu register. On intel x86 architecture the register where the return value is stored is called $EAX or $RAX depending on the size of the type returned.

Wait what are registers

Registers are the fastest form of memory known to mankind. They are located phyiscally on the CPU in each core. In modern cpu's the register are 8 bytes wide that is 64 bits. So far I have mentioned $pc the program counter, which on an x86 cpu is called $RIP, but in gdb you can reference as $pc (program counter). The $rip registers stores the address in memory where the next instruction to be executed is stored. Now I am discussing the $rax register where function store RETURN values right before they exit.

here are some of the registers and the values stored there in hex and base10

Lets change foo so that it returns an TYPE called int. If we view the disassembly in gdb notice what happens right before we return from the function

	  int foo()
	  {
	  int y = 4;
	  return 255;
	  }
	  

Now look what happens in main when we return from foo. The return value was stored in $eax so it is copied from $eax to the memory location specified by the symbol r. r happens to be located at the memory address specified by the $rbp register-4 bytes.


	  void main()
	  {
	  int r = foo();
	  }
	  

function anatomy: The Name

The next part is of the defintion is the name of the function. In this case foo, which is a symbol for the starting address in memory where a sequence of cpu instructions begin.


		// function defintion
		void foo(void)
		{
		}
	  

Arguments to the function

The part of the function definition in between the parentheses are things(values,bit patterns) that we want to send to the function in order for it to do it's work.. In this case because we are using the word void, Which means nothing we will send nothing to the function.


		void foo(void)
		{
		}
	  

How are arguments passed?

Prior to function being called the Arguemntes are copied to some registers by the calling function. The called function then copies the values from the registers to it's local stack frame. we will soon talk about the stack frame

In many computer architectures, particularly x86-64, there's a specific order and set of registers used for passing arguments to functions.

  • First argument: RDI
  • Second argument: RSI
  • Third argument: RDX
  • Fourth argument: RCX
  • Fifth argument: R8
  • Sixth argument: R9

lets see how the arguements are passed under the hood for this code.


	  int foo(int a, int b, int c, int d)
	  {
	      int y = a + b + c + d;
	      return y;
	  }

	  void main(void)
	  {
	      int p = foo(1 ,2, 3,4);

	  }
	  

Function signature

  • return type
  • name
  • the arguments

Combined are called the FUNCTION SIGNATURE or the FUNCTION PROTOTYPE

The signature or protoype defines the cirumstances of the function, how is it called, what does it take, and what does it return


		void foo(void) //Function prototype or signature
		{
		}
	  

function body

after the function signature/prototyp, we have the function body. The part in between the curly brackets is the function body. When the function is called the code that is in the body is what will be executed on the cpu


		void foo(void) //Function prototype or signature
		{
			//Function Body in between the curly braces
		}
	  

declarations and definitions

A function definition has a signature(aka prototype) and a body, as shown in the previous slide. If you just have the signature/prototype by itself and put a semi colon at the end. It is called a function declaration.


		void foo(void); //Function declaration.
	  

Forward Declarations and Defintions

C/C++ allow you to complete the compilation step of your code. As long as a function declaration is present prior to any function calls that you make to that function. This is called a FORWARD DECLARATION.

HOWEVER, if you never provide a full function definition the linker step will fail when it tries to resolve the symbols it is looking for in the function call.

When you include a standard header file like math.h You are including forward declarations for any function from that library that you want to use. The linker step will look for the complete function definition when it tried to resolve the function call. if it can't find it, you will get a linker error.


		// function declaration


		//function defintion
		void foo(void) //Function prototype or signature 
		{
			//Function Body in between the curly braces
		}
	  

statements and semi

All statements in c end in a semicolon, white space is disregarded. The semi colon tells the compiler this is the end of the statement. It is completely ok to make a statement like this. It is not adviseable but will compile. White space only effects symbol names.


		int x,
			y
		,
		p =

                                             3;
	  

next week

  • stack frame routine.(with streets)
  • keywords that represent types
  • argc, argv main is just a function
  • forward declaration/definition