Reverese Engineering

Reverese Engineering: Understanding a C program

Reverese engineering is a powerful tool for any software developer. But as with any tool, it’s only as good as the person using it. Understanding reverse engineering and how it can be used is important for new and veteran developers alike.

According to wikipedia,

Reverse engineering, also called back engineering, is the process by which a man-made object is deconstructed to reveal its designs, architecture, or to extract knowledge from the object; similar to scientific research, the only difference being that scientific research is about a natural phenomenon.

I am trying to keep this article more practical as there are a lot of new things which are going to come. So read this to know more about Reverse Engineering as there can be some legal issues if you did it wrong.

There are many Reverse Engineering tools in the market that can ease up your job, but in the process there are a lot of things which you can miss. So For starting the journey, I am using GNU Debugger or gdb to Reverse Engineer some basic programs written in C Programming language and will generate an understanding as we move forward.

Before taking onto some real world binaries/programs, we’ll try to build our own basic programs and try to reverse engineer them.

As my first step, I am going to write a basic code in C programming language.

(NOTE: I am using RHEL 7.5 for my development and test machine in this article.)

A basic C program

Let’s understand this example as a stepping stone.

In this example,

  1. We have included a header file to our program. This header file contains a lot of statements that help us in many functions related to Standard Input and Output.
  2. In second line we have created our main function.
  3. The opening curly braces denotes the starting of the block associated with the main function.
  4. Return statement can be used to pass the status of the return from this function. Here the status code is 0 that means EXIT_SUCCESS and there are a lot of other exit status codes. These are used to tell the caller function about the exit status of the called function.

Time to compile it and get an executable file.

Compilation
gcc main.c -o basic.o

In this command,

  • gcc is the compiler i used to compile my code.
  • main.c is the program file name.
  • -o specifies the output file name, here i have given basic.o (By default it is a.out)

We can easily see what type of file it is in linux using the file command.

file type

This clearly states that the basic.o is a ELF 64-bit executable for x86–64 type of architecture and is dynamically linked.

I’ll explain in more detail about the above output whenever it will be required.

For now, let’s focus on how to use gdb (GNU Debugger).

  • To open your executable file in gdb → gdb <filename>
You might get different output than mine.

It is very obvious to know that each C program will have a main function. And that is the entry point too. So I’ll start disassembling that first.

Or you can use → disas main

Let’s first understand how gdb shows the output and what does it mean?

0x00000000004004cd <+0>: push %rbp

The values at the left side of the colon (:) are the memory locations, while the value at the right side is the assembly code.

The assembly code is composed of two parts:

  • the instruction, and
  • 0 or more operands

There are plenty of Instruction sets and their variations that can take up a lot of time of yours to completely understand them and learn how they works. So instead of learning every bit about assembly, we’ll make some intelligent guesses and focus only on the instruction which we need to know. This will save us a lot of time and help us to move forward quickly.

There are only few things which you should know before jumping on to this hell.

  • Any instruction that works on some data either takes one or more addresses and/or registers as operands, which it operates on
  • A register like eax/rax is a basically a temporary/scratch memory close to the CPU for getting fast access to a data being used locally frequently
  • There are few special registers. For current problem, know that rbp is the Stack Base pointer or the bottom of current stack frame, rsp is the stack pointer or the current top of the stack, rip is the instruction pointer or the address of the instruction which is just about to be executed. We’ll learn more about the rbp/rsp when we do a real stack overflow problem.
  • A square bracket [] around a address or register signifies that the instruction refers to the value present in that address/register as the source or destination of the operation instead of the address/register itself.
  • For most operations involving a source and destination operand, the left operand is the destination.

You can take the help from this guide, if you face problem in understanding the concepts of reverse engineering of x86 assembly. And always google something if in confusion.

Above code was very basic and does not include any function calls or variable assignments; Below are more examples that can help you in getting better at the guessing game.

Example 1
Example 2
Example 3

Comparing the above examples you might get some rough of how assembly is working with respect to a C program.

First Example is self explanatory, that shows how a block is constructed.

Second and third provide the same output, despite of the difference in code. We’ll talk about this later, but first let’s focus on how print() works.

To call any function we need the callq instruction. There is a single operand in the instruction set, i.e, 0x400400 . This is the memory location for the function printf , also there is a hint <printf@plt> which tells us about which function does the memory location points.

Also there are some mov instructions before the calling of printf() takes place. What are these? These are the values passed to the functions. They are always processed before the function calling happens.

According to that, both mov instructions are related to the “hello” string passed to the printf function. And $0x4005d0 is probably my string here.

Good thing is that we can convert this hex data into string data inside gdb and read it.

Boom!! This trick can help me read out the values of variables and parameters passed to any function or even declared.

There is a lot more to know about how to use GDB and and understanding C by learning assembly. Later I’ll be sharing some CTF writeups that will help you to understand more about Reverse Engineering.

Learned something? Clap your 👏 to say “thanks!” and help others find this article.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store