Bit Twiddling

Bitwise Operators

Sometimes we want to manipulate individual bits directly. In C/C++ we have the following bitwise operators.

Bitwise operations allow you to perform super fast, concise and efficient mathamatical and comparison operations on numbers.

"<<" left shift
">>" right shift
"&" bitwise and
"|" bitwise or
"~" bitwise not
"^" bitwise xor

shift operation syntax


		int x = 1;

		// shift x to the left 1 bit and assign the result to x

			x = x << 1; 

		// you can also use this shorthand syntax.

			x <<= 1; // x is equal to x shift left one.

The left shift operator "<<" shifts bits to the left by the number of bits you specify.

Here, we set x to 1, and left shift x several times by 1 bit. How is the numeric value changing when we shift left by 1?

The value of x is doubling every time we shift to the left by 1 bit. In other words x = x * 2. But we can also think of the number 2 as 2^1, two raised to the first power.

So when we shift 1 bit we are multiplying the value of what we shift, that is x in this case, by 2^n where n is the number of bits we are shifting.

We can shift to the left by more that one bit.

When we shift left by n bits we are multiplying the current value by 2^n

Wait! What happened during the last shift ?

In C the defined behavior for a left shift is called a logical shift left. It slides each bit over to the left n number of times you asked.

It fills in the right hand side with 0's so if we are dealing with a single byte, after 8 1 bit left shifts the value will be 0.

The hardware itself is capable of doing different kind of shifts. But they are not directly accesable in C/C++ code you would need to write assembly directly to use them

The additional hardware Left shifts are

Rotate Left, which shifts left but wraps around the MSB to the LSB
Rotate through Carry Left, includes carry flag, and MSB is moved to carry flag, carry flag is moved to lsb

We often use Hex notation as a shorthand for binary because each hex digit aligns cleanly with 4 binary bits, making translation very clean and simple.

Two hex digits align with 8 bits (1 BYTE)


		// 0b tells the compiler this is a binary literal
		// 0x tells the compiler this is hex literal
		// the compiler assumes numbers without a prefix are decimal literals
		char x = 0x0A;
		char y = 0b0001010;
		char z = 10;

x, y, z are set to the same value

In the next slides I set int x = 0xf, that is a 4 byte block with the value 15.

In this demo, I use gdb commands that display examined memory, in Big Endian format, to make thing a little easier to read.

What I am showing here is that a single hex digit takes up a nibble of bits. And as we shift 4 left we are shifting a nibble of bits over.

This is showing the same as previous but this time we are displayin the results in binary

What is happening here? We had a 1111 in the most significant nibble. after we shift left 2 bits why does the most significant nibble have 0xc ?

We can also right shift using the operator ">>". I set, a single byte, an unsigned char x = 0x80, which is 0b10000000. Then shift right.

When we shift right by n bits we divide the value we are shifting by 2^n. If we shift by 1 we are dividing by 2^1, which is divide by 2.

Signed values

When you execute a right shift, the bit symbol, (0 or 1), that fills in the MSB(most significant bit) will be a 0, if the type you are operating on is unsigned

If the type is signed, the right shift will propogate what ever symbol (0 or 1) is currently at the MSB(most significant bit).

To read a signed value, add up all the positions with a 1 in them and subtract the MSB.

Given 1 signed byte, 0x9A

	    1   0   0   1    1    0    1   0
	   2^7 2^6 2^5 2^4  2^3  2^2 2^1 2^0

	   Add together wherever there is a 1 except the msb.
	   Then subtract the msb.

	   (2^1 + 2^3 + 2^4) - 2^7 = 25 - 128 = -103

In the next slide I demonstrate right shift of the signed and unsigned variables declared below.

In the commments I introduce a new syntax ( 1 << n). This evaluates to a binary literal by shifting the value 1 to left n times. It is a short hand for creating a literal or a mask

These values are exactly the same as the previous

the bitewise or operator "|"

A single pipe "|" in c acts as a bitwise "ORx" operator. Bitwise "ORx" looks at every bit in both operands and if either bit is a 1 it returns 1.

OR is a way of keeping whatever bit is set, set and also setting a new bits that are set in the second operand. It is a way of combining things together.

	      
			char x = 0b00000000 (same as saying x = 0)
				x = x | 1         = 0b00000001
				x = x | (1 << 2)  = 0b00000101
				x = x | (1 << 4)  = 0b00010101

bitwise "&" AND

The bitwise and "&" requires a bit to be a set in both operands for the result to be set.

AND is often used to do masking. Which is to say you have a bit pattern in a mask and apply it to an operand. The result will have bits set where there were already bits set it in the operand and where they were set in the mask.

It is a way of asking, are the bits i set in the mask set in the operand?

You can see in the last slide we set a bunch of bits using the | operator but the only one that was in common was (1 << 4) and that is the only one that remainef after we applyed the mask using "&" operator.

The bitwise NOT operator "~"

Takes all the 1's and turns them to 0's and takes all the 0's and turns them to 1's. It Inverts the value of each bit.

The last bitwise operator we are goint to look at is the exclusive or "^"

The xor operator sets the output to 1, if the input is (1 and 0) or (0 and 1)

That is why it is exclusively an OR, 2 set bits (1 and 1) returns a 0

One of the interesting things about xor is that it is symmetric with itself. So if you xor A with B and put the result in A, then you xor A with B again, you get the original A Back. This operation is commonly used in cryptography

Logical Operator

Boolean values are values that can be thought of as in one of two states

True or False
On or Off
Set or Unset

The way c makes the distinction between the the two states is

If something is 0 it's false otherwise it's true

	      
		int x = 0; // FALSE, NOT SET, NOT TRUE
		int y = 1; // TRUE, SET, ON
		int z = 21304234; // TRUE, SET, ON
		int w = -234; // TRUE, SET, ON x

In C/C++ things are either 0 or they are not. That is how C/C++ thinks of booleaness

The logical operators &&, ||, resolve to (0) or (not 0), as it's two states to represent true or false

This is logical "OR ( | )" resolving to 0 or 1

Comparison Operators (< > == != ) also resolve to one of two states 0 which is False or not 0 which is true.

Control Flow

Control flow is a way of preventing something from running unless something is true. C considers 0 false and (NOT 0) to be true. The value within the parens, the predicate, of an if statement will dictate true or false it can be any type!!

int
char
pointer
float
double
short
"string literal"


		if(1) {}    // will run
		if(0) {}    // won't run
		if(NULL) {} // won't run
		if("ok") {} // will run
		if(-10)  {} // will run

		int q;
		if(q) {}    //  MAYBE

		int q = 0
		if(q) {}    // won't run

		int p;
		if(&p) {}   // will run

		int *p = NULL;
		if(p) {}    // Wont' run

		if(.323432) {} // Will run

		char *x = "Hello world";
		if(x) {}  // will run;
		if(*x) {} // will run
		if(&x) {} // will run

The expression within the () ,the predicate, of an if statement can be super complex, it can be anything from a literal value, to a variable, to a complex mathamaitcal expression, etc...

It can also include one or more Logical Operators, && or || or !(NOT)

The code within the if statement block {code} will not execute if the expression evaluates to 0 and it will if it evaluates to (NOT 0).

The if statement just wants a value. It doesn't matter how you arrive at that value. The evaluation can happen outside the () with the result assigned to a variable. you just put the variable in the parens, or the expression can be evaluated right inside the parens

You can specify an else block of {code} to execute if the predicate to the if statement was false. These can be nested as seen in the next slide.


		if(Not 0)
		{
		// DO THIS	
		}
		else
		{
		// DO THIS THE PREDICATE was 0
		}

Loop operators establish blocks of code that will be repeatedly exectued as long as the predicate is true. Loop expressions evaluate predicates the same as if expressions.

In the loop below, the instructions that are in the {code block } will repeat forever.


		while (1)
		{
		   //code block
		}

You can stop the execution of the loop code and jump out of the code block by using the break statement


		while (1)
		{
		   //code block
		   break;
		}

There are 3 different versions of the syntax for loops, But loops are basically all the same. You can do the exact same thing with all of them. Which one to use is personal preference, for readability purposes.

The variations are essentially syntactic sugar.

They all repeatedly execute the code in the block until the predicate turns false.

In the do while loop, the predicate is check at the end, the code in the block will be executed at lease once.


		do
		{
		   // code block
		} while(expression);

The for loop allows you to write multiple statements about a variable into a single line. It packs everything into a one line.

The declaration
Predicate Test
An operation


		for(x = 0xA;   // this happens once at the beginning
		    x != 0;    // this is the predicate conditional checked at the beginning
		    x = x << 4) // this happens at the end every time thru
                    { 
			// code block
		    }


		 int x = 4;
		 while(x)
		 {
		     x -= 1;
		 }

This produces the same exact disassmebly as the previous while loop

They are exactly the same.


		   for(int x = 4; x; x -= 1);

switch statments allow you to compare a variable with a bunch of different constants, instead of using many nested if else statements as shown below.


		int x = 2;
		if(x == 0)
		{
		}
		else if(x == 1)
		{
		}
		else if(x == 2)
		{
		}
		else if(x == 3)
		{
		}
		else if(x == 4)
		{
		}
		else
		{
		}

The switch jumps to the case that matches and continues execution from there. It starts at the matching case, but doesnt stop there. So we need to add break statements to break out of the switch.


		int x = 2;
		switch(x)
		{
		case 1:
		  printf("case 1");
		  break;
		case 2:
		  printf("case 2");
		  break;
		case 3:
		  printf("case 3");
		  break;
		case 4:
		  printf("case 4");
		  break;
		case 5:
		  printf("case 5");
		  break;
		default: // this is like the final else
		}

code blocks { }

The boundaries of blocks of code are defined by curly braces { }.

Some blocks execution is dependent on Control Flow statements immediatley preceding them. Such as, if statements, loops, and function declarations.

Variables declared within a block are only available within that block and within the blocks nested inside of it.

Code blocks have implications for the stack. The compiler has to make sure there is room on the stack for the variables that will be defined within blocks that are in a given stack frame.

This is a bad idea!!! ASKING FOR DISASTER!!

Character arrays in c

There is no built in data type for strings in c.

In c strings are represented by an array of characters with the last character being '\0'. below: name is on the stack


		char name[4];
		name[0] = 's';
		name[1] = 'a';
		name[0] = 'm';
		name[0] = '\0'; // null character

String Literals

A string literal is a sequence of characters enclosed in double quotes, embedded in the src code. It is a constant that is stored in read only memory, and cannot be altered at runtime. They are immuatble

You can embed a long string literal in your code by using the '\' character.


		"this is a string literal. I want to make it \
		 very long so I will use the \  character to continue \
		 it here in the  src code. The backslash character \
		 does not insert the newline character into the \
		literal it just escapes the new line in the src code";

C treats string literals as character arrays, but an immutable character array in readonly memory

A sequence of characters with the last being the '\0' null character

To access a string literal you use a pointer of type char *

You can access any byte of the literal, but you cannot change them

In the following declaration NameLiteral holds the address of "Sam", which is an address in readonly memory

	      
		char *NameLiteral = "Sam";

In the below code, The first Declaration creates space on the stack for NameArray and copies the item in the quotes to the bytes of the array. This is not considered a string literal but just some syntactic sugar to quickly initialize an array of char

The second Declaration point NameLiteral to the string literal "sam" in readonly memory

Third the string literal itself can be treated as an address that you can dereference!!

	      
		    char NameArray[] = "sam";
		    char *NameLiteral = "sam";
		    printf("%c", "sam"[1]);

Arguments to main

The entry point to Every C/C++ program is the main function.

main like any other function can be passed arguments and return values

There are three arguments automatically passed to main from the operating system

Integer
A pointer to an array of strings
A pointer to an array of strings


		int main(int, char **, char **)

If I run this program and pass a single argument on the command line like this


		$ ./a.out Some_argument


	      int main(int NumArgs, char **Args, char **Envs)
	      {
		      printf("hello");
		      return 0;
	      }

This is what it looks like in gdb

Args is an array of addresses

the first argument an int specifies that there are 2 address in the array of addresses called Args

Each address in Args is the address of a string

As you can sse in our disassmebly, each of the arguments get passed in registers and copied to the stack

An address was passed to our program by the OS via register $rsi

An integer value was also passed via $edi

The address we received was the address in memory of an array. And the value in $edi was the size of the array. That array is an array of addreses

The size of the array is 2. And it is an array of addresses. When we look at the addresses in the array called Args, and then examine those memory locations we see they are also array's of characters aka strings..

The first address in Args is the address of the string that is the path name of the program, all the next records in the array are arguments passed to the program

Environment variables

If you rememember there is also another address passed via $rdx, that is the address of another array of addresses

that array of address are strings that are called environmnetal variables

Environmenta variables are a series of key value pairs specified as strings.

Notice that the strings that are referenced by the addresses in the Env array are themseleve are stored in consecutive memory. I can print them all by starting at a single address

see how the array of address printed below lines up with the consecutive strings above.

The strings DO NOT need to be stored consecutively, but they are.

and in fact all the Arg String and Then the Env strings are stored consecutivley

You may think that the Args and Envs arrays of addresses are stored consecutively too. Almost, but not exactly.