Saturday, March 15, 2014

Hacking Linux executables with LD_PRELOAD code injection

Executable code can be stored on disk in many different formats. In MS-DOS, programs which didn't use more than one 64kB memory segment, could be simply saved in a COM file, which contained only machine code, with no additional metadata. When instructed to run such file, operating system loaded it into a first free memory segment at address 0x100, initialized code segment register (CS), data segment registers (DS, ES) and stack segment register (SS) to point to that segment and then passed control to the first instruction located at the beginning of the file. More complicated executables contained an EXE header with some additional informaton required by the operating system, like the number of memory blocks the program needs or initial values of the segment registers.

Nowadays, applications compiled for multitasking operating systems often use dynamically shared libraries, so the executable file headers also contain information about required libraries and functions included from those libraries. In Linux, you can get that data with readelf. For example, using command:
readelf -s /usr/bin/users
gives the following information (I stripped some of the output to increase readability):
Symbol table '.dynsym' contains 58 entries:
  Num: Type Bind   Vis     Name
    1: FUNC GLOBAL DEFAULT utmpxname@GLIBC_2.2.5 (2)
    2: FUNC GLOBAL DEFAULT free@GLIBC_2.2.5 (2)
    3: FUNC GLOBAL DEFAULT abort@GLIBC_2.2.5 (2)
    4: FUNC GLOBAL DEFAULT __errno_location@GLIBC_2.2.5 (2)
    5: FUNC GLOBAL DEFAULT strncpy@GLIBC_2.2.5 (2)
    6: FUNC GLOBAL DEFAULT strncmp@GLIBC_2.2.5 (2)
    7: FUNC GLOBAL DEFAULT _exit@GLIBC_2.2.5 (2)
    8: FUNC GLOBAL DEFAULT __fpending@GLIBC_2.2.5 (2)
    9: FUNC GLOBAL DEFAULT qsort@GLIBC_2.2.5 (2)
   10: FUNC GLOBAL DEFAULT textdomain@GLIBC_2.2.5 (2)
   11: FUNC GLOBAL DEFAULT endutxent@GLIBC_2.2.5 (2)
   12: FUNC GLOBAL DEFAULT fclose@GLIBC_2.2.5 (2)
   13: FUNC GLOBAL DEFAULT bindtextdomain@GLIBC_2.2.5 (2)
   14: FUNC GLOBAL DEFAULT dcgettext@GLIBC_2.2.5 (2)
   15: FUNC GLOBAL DEFAULT __ctype_get_mb_cur_max@GLIBC_2.2.5 (2)
   16: FUNC GLOBAL DEFAULT strlen@GLIBC_2.2.5 (2)
   17: FUNC GLOBAL DEFAULT __stack_chk_fail@GLIBC_2.4 (3)
   18: FUNC GLOBAL DEFAULT getopt_long@GLIBC_2.2.5 (2)
   19: FUNC GLOBAL DEFAULT mbrtowc@GLIBC_2.2.5 (2)
   20: FUNC GLOBAL DEFAULT __overflow@GLIBC_2.2.5 (2)
   21: FUNC GLOBAL DEFAULT strrchr@GLIBC_2.2.5 (2)
   22: FUNC GLOBAL DEFAULT lseek@GLIBC_2.2.5 (2)
   23: FUNC GLOBAL DEFAULT memset@GLIBC_2.2.5 (2)
   24: FUNC GLOBAL DEFAULT __libc_start_main@GLIBC_2.2.5 (2)
   25: FUNC GLOBAL DEFAULT memcmp@GLIBC_2.2.5 (2)
   26: FUNC GLOBAL DEFAULT fputs_unlocked@GLIBC_2.2.5 (2)
   27: FUNC GLOBAL DEFAULT calloc@GLIBC_2.2.5 (2)
   28: FUNC GLOBAL DEFAULT strcmp@GLIBC_2.2.5 (2)
As you can see in the last line, users executable calls a strcmp function from the standard Linux GNU C Library library. This function compares two strings given as its arguments and returns 0 if the strings are equal, positive value if the first string is greater than the second one, and negative value otherwise.

You can take advantage of the fact that the program uses a function from an external library to modify its behaviour without touching its code. All you have to do is to substitute strcmp with your own function. You can do it by writing your own dynamic library with custom strcmp and load it before the code of the users executable is initialized.

First, let's create a very simple library mylib.c with this code:
int strcmp(const char* s1, const char* s2) {
  for (; *s1 == *s2; s1++, s2++) {
    if (*s1 == '\0') {
      return 0;
    }
  }
  return ((*(unsigned char*)s1 > *(unsigned char*)s2) ? -1 : 1);
}
and compile it with the following command:
gcc mylib.c -shared -Wl,-soname,mylib -o mylib.so -fPIC
As you probably noticed, it behaves opposite to the original strcmp function: it returns negative value when the second string is greater than the first one, and positive value otherwise.

Now all you have to do is to inject the custom library code into the running program. Fortunately, Linux allows you to load your own dynamic libraries before starting any executable with LD_PRELOAD environment variable. We can take advantage of this feature - just compare the results of the two following commands:
[adam@ubuntu:~]$ users
adam eve

[adam@ubuntu:~]$ LD_PRELOAD="./mylib.so" users
eve adam
Because we reversed the behaviour of the strcmp function, and users executable relies on it, we managed to change the result of the command (user logins are reversed in the second example) without even touching it.

Google used the same trick to create TCMalloc, which is a faster modification of the standard Linux malloc.

1 comment:

Anonymous said...

Interesting slides about it:

http://lca2009.linux.org.au/slides/172.pdf