Skip to content

Latest commit

 

History

History
543 lines (353 loc) · 21.1 KB

File metadata and controls

543 lines (353 loc) · 21.1 KB
description cover coverY
06/26/2023
../../.gitbook/assets/GDB_Archer_Fish_by_Andreas_Arnez.svg.png
0

CPU Rabbit Hole

The Central Processing Unit (CPU)

High level -- How does it work:

FETCH -> DECODE -> EXECUTE Cycle

  1. Fetch instruction
  2. Decode instruction
  3. Executes instruction

Confused? Keep reading.

Just the beginning...

Computers operate in binary -- 1's and 0's // on and off.

In order to compute binary on this domain, the CPU will use what is called a transistor.

The transistor allows the source current to flow through it to the drain if there is current across the gate. This forms a "binary switch". which will cut the wire off depending on a secondary input signal.

Modern CPU's utilize billions of transistors to perform calculations, but to understand them, you will only need to know the simple components that perform a handful of things known as gates.

{% embed url="https://www.howtogeek.com/wp-content/uploads/2018/10/cpu_2.png?trim=1,1&bg-color=000&pad=1,1" %}

Logic Gates

The second that one stacks a few transistors on top of each other properly, you will form a logic gate.

These appear complex, but they (logic gates) are rather simple: take two binary inputs and perform an operation on them and lastly, return an output.

OR

Returns true if either of the inputs is true.

AND

Checks if both inputs are true.

XOR (Exclusive or)

Checks if only one of the inputs are true.

N-Variants (NOR, NAND, and XNOR)

Inverted versions of their base gates.

The Bus and Memory

Although our computer is able perform math at this point, it really can't do anything beyond that.

It can't remember anything and does nothing with what it is able to compute.

In order to do those things, would require a memory cell. Below is a depiction of that:

{% embed url="https://www.howtogeek.com/wp-content/uploads/2018/10/cpu_7.png?trim=1,1&bg-color=000&pad=1,1" %} Basic memory cell utilizing lots of NAND gates for writing and AND gates for reading {% endembed %}

How does this work?

You give some inputs, turn on the write bit and it will store the inputs within the memory cell.

However, do you see an issue with just having a memory cell? We still don't have a way to read the information from it.

This is done with an output enabler.

This is tied to another input A.K.A. the read bit.

The enabler utilizes AND gates to achieve this. The read/write bits are known as "set and enables".

What are these called as a system? These are registers.

Registers

{% embed url="https://www.howtogeek.com/wp-content/uploads/2018/10/cpu_8.png?trim=1,1&bg-color=000&pad=1,1" %}

Notice that each register still has a read/write bit associated with it. They are rather simple.

Example: You want to copy R1's data into R2, what would you do?

You would turn on the read bit for R1 which would push the contents of R1 onto the bus.

After this, you would keep the read bit on, turn on the write bit for R2, which would in return copy the contents into R2.

The Clock, Stepper, and Decoder

Registers are in charge of moving data around and storing information within the CPU. However, what tells them to move things around?

The clock is a truly core component of the CPU and will turn off and on at a pre-determined and set interval.

This is measured in hertz or cycles per second.

  • e.g. A 5 GHz CPU can perform five billion cycles per second
  • Therefore making this a very good metric as to how fast a CPU can be

{% embed url="https://www.howtogeek.com/wp-content/uploads/2018/10/cpu_10.png?trim=1,1&bg-color=000&pad=1,1" %} The clock has three different clock states {% endembed %}

The clock has three different states, the base clock, the enable clock, and the set clock.

Base Clock

Will be on for half a cycle and off for the remaining half.

Enable Clock

Used to turn on registers and from the model above, you will see that they must remain turned on longer to ensure that the data is enabled.

Set Clock

Always needs to be on at the same time as the enable clock or else incorrect data could be written.

Stepper

The clock is connected to the stepper.

The stepper will count from one to the max step and then rest itself back to one once finished.

The clock is connected to AND gates for each register that the CPU can write to.

Program instructions used to be stored in RAM, however, they are now stored in the CPU's L1 cache which is directly embedded on the CPU and does not need to travel across the bus (slowing read/write/execute times) to fetch instructions.

How does it all come together?

The mainbus or motherboard is the primary component.

The Arithmetic Logic Unit (ALU), Instructions Register, Registers, Control Section, and RAM all come together.

Let's perform a single calculation

  1. Load program data into the control section
  2. The control section will then read two numbers from RAM, load the first one into the the ALU's instruction register and then load the second one on the bus
  3. It then sends the ALU an instruction code telling it what to compute
  4. The ALU then performs all calculations and stores the results in a different register
  5. The CPU can read these registers and then continue the process

Registers continued

Processors have their own set of special variables called registers.

Most of the instructions use these registers to read or write data.

  • It is imperative to understand registers to understand instructions

x64 CPU Registers

The x64 CPU has eight different registers. These are like internal variables for the processor.

Please go over the reference sheet when looking through the debugger output for a better understanding of what is going on.

{% embed url="https://web.stanford.edu/class/cs107/resources/x86-64-reference.pdf" %}

What is GDB? -- Debuggers

Debuggers are used by programmers to step through compiled programs, examine program memory, and view processor registers.

Why are they so important and so powerful?

Similar to a microscope, a debugger allows a hacker to observe the microscopic world of machine code. However, it is far more powerful than that. A debugger allows the hacker to be able to view the execution from all angles, pause it, and change anything along the way.

How can we see these variables visually to best learn about them?

  • Let's use gdb
gdb -q ./helloWorld.exe
Reading symbols from .\helloWorld.exe...
(gdb) run
Starting program: C:\Users\Xyconix\Desktop\projects\C\helloWorld.exe
[New Thread 7456.0xdf8]
Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

Hello world!

[Thread 7456.0xdf8 exited with code 0]
[Inferior 1 (process 7456) exited normally]
(gdb)

Breakpoints

The breakpoint will stop execution right on main so that no code will execute yet.

This will allow us to pause execution and be able to display all registers and their current states.

Let's view the state of the processor registers right before the program starts:

(gdb) break main
Breakpoint 1 at 0x7ff74097145d: file C:\Users\Xyconix\Desktop\projects\C\helloWorld.c, line 7.
(gdb) run
Starting program: C:\Users\Xyconix\Desktop\projects\C\helloWorld.exe
[New Thread 8688.0x514]

Thread 1 hit Breakpoint 1, main () at C:\Users\Xyconix\Desktop\projects\C\helloWorld.c:7
7           for (i=0; i < 10; i++)          // This will loop 10 times
(gdb)
  • Notice how since there is a loop in the main(), you will need to use n to execute the next instruction
  • Since there are 10 loops, you need to execute it 10 times
  • We can also use info registers to view the state of the processor registers at that time of execution
gdb -q .\helloWorld.exe
Reading symbols from .\helloWorld.exe...
(gdb) break main
Breakpoint 1 at 0x14000145d: file C:\Users\Xyconix\Desktop\projects\C\helloWorld.c, line 7.
(gdb) run
Starting program: C:\Users\Xyconix\Desktop\projects\C\helloWorld.exe
[New Thread 9552.0x608]

Thread 1 hit Breakpoint 1, main () at C:\Users\Xyconix\Desktop\projects\C\helloWorld.c:7
7           for (i=0; i < 10; i++)          // This will loop 10 times
(gdb) info registers
rax            0x1                 1
rbx            0x8                 8
rcx            0x1                 1
rdx            0xb415f0            11802096
rsi            0x33                51
rdi            0xb415f8            11802104
rbp            0x5ffea0            0x5ffea0
rsp            0x5ffe70            0x5ffe70
r8             0xb41c90            11803792
r9             0x7ffb865f6850      140718267918416
r10            0x0                 0
r11            0x5ffd08            6290696
r12            0xb41630            11802160
r13            0x0                 0
r14            0x0                 0
r15            0x0                 0
rip            0x7ff74097145d      0x7ff74097145d <main+13>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x2b                43
es             0x2b                43
fs             0x53                83
gs             0x2b                43
(gdb)

x64 Registers

Inside of gdb, run: set disassembly-flavor intel to set the disassembly to be in an intel syntax.

x64 Registers

%rip: Instruction Pointer

%rsp: Stack Pointer

%rax: Return Value

%rbp: Base Pointer

These registers are known as general-purpose registers. They are responsible for executing machine instructions. %rip and %rsp are called pointers because they store 32-bit respectively and point to that location in memory. They are VERY important to program execution and memory management.

The Instruction Pointer points to the current instruction that the processor is reading.

The Stack Pointer is used to keep track of the top of the stack in the program's memory. It points to the most recently pushed value onto the stack.

The Base Pointer is used as a reference or base pointer in stack-based operations. It is often used to access function parameters, local variables, and saved registers within a function's stack name. The RBP usually stays the same throughout execution while the RSP changes as the stack grows and shrinks.

In short, the RSP keeps track of the top of the stack and manages stack operations.

The RBP is used as a fixed reference point within the stack frame for accessing local variables and function parameters.

I was curious about how the return value works.

On line 12 of our helloWorld.c program, I have a return 0, so I placed a break 12 point on this line using `breaklet's see what this looks like in gdb:

Thread 1 hit Breakpoint 2, main () at C:\Users\Xyconix\Desktop\projects\C\helloWorld.c:12
12          return 0;
(gdb) info registers
rax            0x0                 0
rbx            0x8                 8
rcx            0xffffffff          4294967295
rdx            0x7ffb84f7fa30      140718244362800
rsi            0x33                51
rdi            0x1c15f8            1840632
rbp            0x5ffea0            0x5ffea0
rsp            0x5ffe70            0x5ffe70
r8             0x7ffb84f85940      140718244387136
r9             0x5fe2e0            6284000
r10            0x0                 0
r11            0x246               582
r12            0x1c1630            1840688
r13            0x0                 0
r14            0x0                 0
r15            0x0                 0
rip            0x7ff74097147f      0x7ff74097147f <main+47>
eflags         0x202               [ IF ]
cs             0x33                51
ss             0x2b                43
ds             0x2b                43
es             0x2b                43
fs             0x53                83
gs             0x2b                43

You can see that %rax has a value of 0, hence return 0 on line 12!!!!

How cool is that?

Let's simplify our code...

helloWorld.c:

#include <stdio.h>

int main()
{
    puts("Hello world!\n");
    
    return 0;
}

Dump ASM for main function:

Reading symbols from .\helloWorld.exe...
(gdb) b 8
Breakpoint 1 at 0x14000145d: file .\helloWorld.c, line 8.

(gdb) run
Starting program: C:\Users\Xyconix\Desktop\projects\C\helloWorld.exe
[New Thread 460.0x2cb8]

(gdb) set disassembly-flavor intel

(gdb) disas main
Dump of assembler code for function main:
   0x00007ff71cd01450 <+0>:     push   rbp
   0x00007ff71cd01451 <+1>:     mov    rbp,rsp
   0x00007ff71cd01454 <+4>:     sub    rsp,0x20
   0x00007ff71cd01458 <+8>:     call   0x7ff71cd01530 <__main>
=> 0x00007ff71cd0145d <+13>:    lea    rax,[rip+0x2b9c]        # 0x7ff71cd04000
   0x00007ff71cd01464 <+20>:    mov    rcx,rax
   0x00007ff71cd01467 <+23>:    call   0x7ff71cd02628 <puts>
   0x00007ff71cd0146c <+28>:    mov    eax,0x0
   0x00007ff71cd01471 <+33>:    add    rsp,0x20
   0x00007ff71cd01475 <+37>:    pop    rbp
   0x00007ff71cd01476 <+38>:    ret
End of assembler dump.

Utilizing x/s to look into data

We can verify our "Hello world!" string is located within the address 0x7ff71cd04000

What is x/s? x = examine and s = string.

We can even use list to view source code. This is done utlizing the -g option when compiling! Essentially, -g will give gdb access to the source code.

(gdb) list
3       int main()
4       {
5
6           //int i;
7           //for (i=0; i < 10; i++)          // This will loop 10 times
8           puts("Hello world!\n");     // puts() will write a string to stdout up to but not including the null char
9
10          return 0;
11
12      }
(gdb)

Operations

Intel syntax will generally follow this style:

operation <destination>, <source>

There are operations that are used to control the flow of execution.

The cmp operation is used to compare values and basically any operation with j is used to jump to a different part of code.

Any operation that starts with j is used to jump to a different part of the code.

Debugging

In the next section, we will be further debugging our simple helloWorld.c.

Let's walk through this a little bit more:

  1. set disassembly-flavor intel will set the syntax to utilize an Intel-based syntax
  2. list list out the source code
  3. disas main will List out the disassembly of main()
  4. b main establishes a breakpoint at the start of main(). When debugging, you will hit the breakpoint of main() before any execution actually begins within main() and any execution up to this point is paused.
  5. Then, we use info register rip to display the value of the RIP instruction pointer.

Examining values in memory

Shorthand commands:

-o: display in octal

-x: display in hex

-u: display in unsigned, base-10 decimal

-t: display in binary

Also, we can do the same with info register rip and just use i r rip.

Let's use shorthand to examine our RIP register in a few different ways:

Syntax: x/<display_type> <address>

Also, if we wanted to directly reference registers, we can use them as variables since that's all they are!

x/x $rip

Valid size letters:

-b: single byte

-h: a halfword, which is two bytes in size

-w: a word, which is four bytes in size

-g: a giant, which is eight bytes in size

Note: word is two bytes in size, DWORDs are 4-byte values, don't get this confused.

String Literals and Regular Strings

String Literals: A sequence of chars encloses in quotation marks within the source code of a program. Typically stored in read-only memory. Cannot be modified at runtime.

e.g. "Hello, World!"

Regular Strings: Created at runtime and can be modified during execution.

e.g.

#include <stdio.h>
#include <string.h>

int main()
{
    char str_a[20];
        strcpy(str_a, "Hello world!\nI'm now a regular string in an array!\n";
        printf(str_a);
}

Uh oh! I bet you tried to compile this and ended up getting a warning about an overflow!

Since the size of this string is of 52 bytes, we need to statically size our buffer of 52 rather than 20, or we will overflow into our stack with our string. This is why you need to be careful with strcpy!

Fixed code:

#include <stdio.h>
#include <string.h>

int main()
{
    char str_a[52];
        strcpy(str_a, "Hello world!\nI'm now a regular string in an array!\n";
        printf(str_a);
}

What if you did not know the size?

That's where you will dynamically allocate using malloc().

Data Type Sizes

#include <stdio.h>

int main()
{

    printf("The 'int' data type is \t\t %d bytes\n\n", sizeof(int));                          // int data type
    printf("The 'short int' data type is \t %d bytes\n\n", sizeof(short int));                // short int data type
    printf("The 'long int' data type is \t %d bytes \n\n", sizeof(long int));                 // long int data type
    printf("The 'long long int' data type is %d bytes\n\n", sizeof (long long int));       // long long data type
    printf("The 'float' data type is \t %d bytes\n\n", sizeof(float));                          // float data type
    printf("The 'char' data type is \t %d bytes\n\n", sizeof(char));                        // char data type

}

So far, it is feeling like string literals are much easier to read in disassembly than regular strings.

Pointers

The EIP register is a pointer that "points" to the current instruction during a program's execution by containing its memory address.

The idea of pointers is HUGE in C.

Since physical memory cannot actually be moved, the information in it must be copied. Space for the new destination copy must be saved or allocated before the source can be copied.

However, instead of copying a large block of memory (which would be horribly inefficient), it is much simpler to pass around the address of the beginning of that block of memory.

Pointers in C are 32 bits in size (4 bytes) and defined using an asterisk to the variable name.