description | cover | coverY |
---|---|---|
06/26/2023 |
../../.gitbook/assets/GDB_Archer_Fish_by_Andreas_Arnez.svg.png |
0 |
FETCH -> DECODE -> EXECUTE Cycle
- Fetch instruction
- Decode instruction
- Executes instruction
Confused? Keep reading.
Computers operate in binary -- 1's and 0's // on and off.
In order to compute binary on this domain, the CPU will use what is called a transistor.
The transistor allows the source current to flow through it to the drain if there is current across the gate. This forms a "binary switch". which will cut the wire off depending on a secondary input signal.
Modern CPU's utilize billions of transistors to perform calculations, but to understand them, you will only need to know the simple components that perform a handful of things known as gates.
{% embed url="https://www.howtogeek.com/wp-content/uploads/2018/10/cpu_2.png?trim=1,1&bg-color=000&pad=1,1" %}
The second that one stacks a few transistors on top of each other properly, you will form a logic gate.
These appear complex, but they (logic gates) are rather simple: take two binary inputs and perform an operation on them and lastly, return an output.
Returns true if either of the inputs is true.
Checks if both inputs are true.
Checks if only one of the inputs are true.
Inverted versions of their base gates.
Although our computer is able perform math at this point, it really can't do anything beyond that.
It can't remember anything and does nothing with what it is able to compute.
In order to do those things, would require a memory cell. Below is a depiction of that:
{% embed url="https://www.howtogeek.com/wp-content/uploads/2018/10/cpu_7.png?trim=1,1&bg-color=000&pad=1,1" %} Basic memory cell utilizing lots of NAND gates for writing and AND gates for reading {% endembed %}
You give some inputs, turn on the write bit and it will store the inputs within the memory cell.
However, do you see an issue with just having a memory cell? We still don't have a way to read the information from it.
This is done with an output enabler.
This is tied to another input A.K.A. the read bit.
The enabler utilizes AND gates to achieve this. The read/write bits are known as "set and enables".
What are these called as a system? These are registers.
{% embed url="https://www.howtogeek.com/wp-content/uploads/2018/10/cpu_8.png?trim=1,1&bg-color=000&pad=1,1" %}
Notice that each register still has a read/write bit associated with it. They are rather simple.
You would turn on the read bit for R1 which would push the contents of R1 onto the bus.
After this, you would keep the read bit on, turn on the write bit for R2, which would in return copy the contents into R2.
Registers are in charge of moving data around and storing information within the CPU. However, what tells them to move things around?
The clock is a truly core component of the CPU and will turn off and on at a pre-determined and set interval.
This is measured in hertz or cycles per second.
- e.g. A 5 GHz CPU can perform five billion cycles per second
- Therefore making this a very good metric as to how fast a CPU can be
{% embed url="https://www.howtogeek.com/wp-content/uploads/2018/10/cpu_10.png?trim=1,1&bg-color=000&pad=1,1" %} The clock has three different clock states {% endembed %}
The clock has three different states, the base clock, the enable clock, and the set clock.
Will be on for half a cycle and off for the remaining half.
Used to turn on registers and from the model above, you will see that they must remain turned on longer to ensure that the data is enabled.
Always needs to be on at the same time as the enable clock or else incorrect data could be written.
The clock is connected to the stepper.
The stepper will count from one to the max step and then rest itself back to one once finished.
The clock is connected to AND gates for each register that the CPU can write to.
Program instructions used to be stored in RAM, however, they are now stored in the CPU's L1 cache which is directly embedded on the CPU and does not need to travel across the bus (slowing read/write/execute times) to fetch instructions.
The mainbus or motherboard is the primary component.
The Arithmetic Logic Unit (ALU), Instructions Register, Registers, Control Section, and RAM all come together.
- Load program data into the control section
- The control section will then read two numbers from RAM, load the first one into the the ALU's instruction register and then load the second one on the bus
- It then sends the ALU an instruction code telling it what to compute
- The ALU then performs all calculations and stores the results in a different register
- The CPU can read these registers and then continue the process
Processors have their own set of special variables called registers.
Most of the instructions use these registers to read or write data.
- It is imperative to understand registers to understand instructions
The x64 CPU has eight different registers. These are like internal variables for the processor.
Please go over the reference sheet when looking through the debugger output for a better understanding of what is going on.
{% embed url="https://web.stanford.edu/class/cs107/resources/x86-64-reference.pdf" %}
Debuggers are used by programmers to step through compiled programs, examine program memory, and view processor registers.
Why are they so important and so powerful?
Similar to a microscope, a debugger allows a hacker to observe the microscopic world of machine code. However, it is far more powerful than that. A debugger allows the hacker to be able to view the execution from all angles, pause it, and change anything along the way.
How can we see these variables visually to best learn about them?
- Let's use
gdb
gdb -q ./helloWorld.exe
Reading symbols from .\helloWorld.exe...
(gdb) run
Starting program: C:\Users\Xyconix\Desktop\projects\C\helloWorld.exe
[New Thread 7456.0xdf8]
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
[Thread 7456.0xdf8 exited with code 0]
[Inferior 1 (process 7456) exited normally]
(gdb)
The breakpoint will stop execution right on main so that no code will execute yet.
This will allow us to pause execution and be able to display all registers and their current states.
Let's view the state of the processor registers right before the program starts:
(gdb) break main
Breakpoint 1 at 0x7ff74097145d: file C:\Users\Xyconix\Desktop\projects\C\helloWorld.c, line 7.
(gdb) run
Starting program: C:\Users\Xyconix\Desktop\projects\C\helloWorld.exe
[New Thread 8688.0x514]
Thread 1 hit Breakpoint 1, main () at C:\Users\Xyconix\Desktop\projects\C\helloWorld.c:7
7 for (i=0; i < 10; i++) // This will loop 10 times
(gdb)
- Notice how since there is a loop in the
main()
, you will need to usen
to execute the next instruction - Since there are 10 loops, you need to execute it 10 times
- We can also use
info registers
to view the state of the processor registers at that time of execution
gdb -q .\helloWorld.exe
Reading symbols from .\helloWorld.exe...
(gdb) break main
Breakpoint 1 at 0x14000145d: file C:\Users\Xyconix\Desktop\projects\C\helloWorld.c, line 7.
(gdb) run
Starting program: C:\Users\Xyconix\Desktop\projects\C\helloWorld.exe
[New Thread 9552.0x608]
Thread 1 hit Breakpoint 1, main () at C:\Users\Xyconix\Desktop\projects\C\helloWorld.c:7
7 for (i=0; i < 10; i++) // This will loop 10 times
(gdb) info registers
rax 0x1 1
rbx 0x8 8
rcx 0x1 1
rdx 0xb415f0 11802096
rsi 0x33 51
rdi 0xb415f8 11802104
rbp 0x5ffea0 0x5ffea0
rsp 0x5ffe70 0x5ffe70
r8 0xb41c90 11803792
r9 0x7ffb865f6850 140718267918416
r10 0x0 0
r11 0x5ffd08 6290696
r12 0xb41630 11802160
r13 0x0 0
r14 0x0 0
r15 0x0 0
rip 0x7ff74097145d 0x7ff74097145d <main+13>
eflags 0x202 [ IF ]
cs 0x33 51
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x53 83
gs 0x2b 43
(gdb)
Inside of gdb
, run: set disassembly-flavor intel
to set the disassembly to be in an intel syntax.
x64 Registers
%rip
: Instruction Pointer
%rsp
: Stack Pointer
%rax
: Return Value
%rbp
: Base Pointer
These registers are known as general-purpose registers. They are responsible for executing machine instructions. %rip
and %rsp
are called pointers because they store 32-bit respectively and point to that location in memory. They are VERY important to program execution and memory management.
The Instruction Pointer points to the current instruction that the processor is reading.
The Stack Pointer is used to keep track of the top of the stack in the program's memory. It points to the most recently pushed value onto the stack.
The Base Pointer is used as a reference or base pointer in stack-based operations. It is often used to access function parameters, local variables, and saved registers within a function's stack name. The RBP usually stays the same throughout execution while the RSP changes as the stack grows and shrinks.
In short, the RSP keeps track of the top of the stack and manages stack operations.
The RBP is used as a fixed reference point within the stack frame for accessing local variables and function parameters.
I was curious about how the return value works.
On line 12 of our helloWorld.c program, I have a return 0
, so I placed a break 12
point on this line using `breaklet's see what this looks like in gdb:
Thread 1 hit Breakpoint 2, main () at C:\Users\Xyconix\Desktop\projects\C\helloWorld.c:12
12 return 0;
(gdb) info registers
rax 0x0 0
rbx 0x8 8
rcx 0xffffffff 4294967295
rdx 0x7ffb84f7fa30 140718244362800
rsi 0x33 51
rdi 0x1c15f8 1840632
rbp 0x5ffea0 0x5ffea0
rsp 0x5ffe70 0x5ffe70
r8 0x7ffb84f85940 140718244387136
r9 0x5fe2e0 6284000
r10 0x0 0
r11 0x246 582
r12 0x1c1630 1840688
r13 0x0 0
r14 0x0 0
r15 0x0 0
rip 0x7ff74097147f 0x7ff74097147f <main+47>
eflags 0x202 [ IF ]
cs 0x33 51
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x53 83
gs 0x2b 43
You can see that %rax
has a value of 0
, hence return 0
on line 12!!!!
How cool is that?
helloWorld.c:
#include <stdio.h>
int main()
{
puts("Hello world!\n");
return 0;
}
Dump ASM for main function:
Reading symbols from .\helloWorld.exe...
(gdb) b 8
Breakpoint 1 at 0x14000145d: file .\helloWorld.c, line 8.
(gdb) run
Starting program: C:\Users\Xyconix\Desktop\projects\C\helloWorld.exe
[New Thread 460.0x2cb8]
(gdb) set disassembly-flavor intel
(gdb) disas main
Dump of assembler code for function main:
0x00007ff71cd01450 <+0>: push rbp
0x00007ff71cd01451 <+1>: mov rbp,rsp
0x00007ff71cd01454 <+4>: sub rsp,0x20
0x00007ff71cd01458 <+8>: call 0x7ff71cd01530 <__main>
=> 0x00007ff71cd0145d <+13>: lea rax,[rip+0x2b9c] # 0x7ff71cd04000
0x00007ff71cd01464 <+20>: mov rcx,rax
0x00007ff71cd01467 <+23>: call 0x7ff71cd02628 <puts>
0x00007ff71cd0146c <+28>: mov eax,0x0
0x00007ff71cd01471 <+33>: add rsp,0x20
0x00007ff71cd01475 <+37>: pop rbp
0x00007ff71cd01476 <+38>: ret
End of assembler dump.
Utilizing x/s to look into data
We can verify our "Hello world!" string is located within the address 0x7ff71cd04000
What is x/s? x = examine and s = string.
We can even use list
to view source code. This is done utlizing the -g
option when compiling! Essentially, -g
will give gdb
access to the source code.
(gdb) list
3 int main()
4 {
5
6 //int i;
7 //for (i=0; i < 10; i++) // This will loop 10 times
8 puts("Hello world!\n"); // puts() will write a string to stdout up to but not including the null char
9
10 return 0;
11
12 }
(gdb)
Intel syntax will generally follow this style:
operation <destination>, <source>
There are operations that are used to control the flow of execution.
The cmp
operation is used to compare values and basically any operation with j is used to jump to a different part of code.
Any operation that starts with j is used to jump to a different part of the code.
In the next section, we will be further debugging our simple helloWorld.c.
Let's walk through this a little bit more:
set disassembly-flavor intel
will set the syntax to utilize an Intel-based syntaxlist
list out the source codedisas main
will List out the disassembly of main()b main
establishes a breakpoint at the start ofmain()
. When debugging, you will hit the breakpoint ofmain()
before any execution actually begins withinmain()
and any execution up to this point is paused.- Then, we use
info register rip
to display the value of the RIP instruction pointer.
Shorthand commands:
-o
: display in octal
-x
: display in hex
-u
: display in unsigned, base-10 decimal
-t
: display in binary
Also, we can do the same with info register rip
and just use i r rip
.
Let's use shorthand to examine our RIP register in a few different ways:
Syntax: x/<display_type> <address>
Also, if we wanted to directly reference registers, we can use them as variables since that's all they are!
x/x $rip
Valid size letters:
-b
: single byte
-h
: a halfword, which is two bytes in size
-w
: a word, which is four bytes in size
-g
: a giant, which is eight bytes in size
Note: word is two bytes in size, DWORDs are 4-byte values, don't get this confused.
String Literals: A sequence of chars encloses in quotation marks within the source code of a program. Typically stored in read-only memory. Cannot be modified at runtime.
e.g. "Hello, World!"
Regular Strings: Created at runtime and can be modified during execution.
e.g.
#include <stdio.h>
#include <string.h>
int main()
{
char str_a[20];
strcpy(str_a, "Hello world!\nI'm now a regular string in an array!\n";
printf(str_a);
}
Uh oh! I bet you tried to compile this and ended up getting a warning about an overflow!
Since the size of this string is of 52 bytes, we need to statically size our buffer of 52 rather than 20, or we will overflow into our stack with our string. This is why you need to be careful with strcpy
!
Fixed code:
#include <stdio.h>
#include <string.h>
int main()
{
char str_a[52];
strcpy(str_a, "Hello world!\nI'm now a regular string in an array!\n";
printf(str_a);
}
That's where you will dynamically allocate using malloc()
.
#include <stdio.h>
int main()
{
printf("The 'int' data type is \t\t %d bytes\n\n", sizeof(int)); // int data type
printf("The 'short int' data type is \t %d bytes\n\n", sizeof(short int)); // short int data type
printf("The 'long int' data type is \t %d bytes \n\n", sizeof(long int)); // long int data type
printf("The 'long long int' data type is %d bytes\n\n", sizeof (long long int)); // long long data type
printf("The 'float' data type is \t %d bytes\n\n", sizeof(float)); // float data type
printf("The 'char' data type is \t %d bytes\n\n", sizeof(char)); // char data type
}
So far, it is feeling like string literals are much easier to read in disassembly than regular strings.
The EIP register is a pointer that "points" to the current instruction during a program's execution by containing its memory address.
The idea of pointers is HUGE in C.
Since physical memory cannot actually be moved, the information in it must be copied. Space for the new destination copy must be saved or allocated before the source can be copied.
However, instead of copying a large block of memory (which would be horribly inefficient), it is much simpler to pass around the address of the beginning of that block of memory.
Pointers in C are 32 bits in size (4 bytes) and defined using an asterisk to the variable name.