Thursday, September 20, 2012

MoVP 2.4 Analyzing the Jynx rootkit and LD_PRELOAD

Month of Volatility Plugins

In this post I will analyze the Jynx rootkit using Volatility’s new Linux features.  

If you would like to follow along or recreate the steps taken, please see the LinuxForensicsWiki  for instructions on how to do so.

Obtaining the Samples

In order to have samples to test against, I used the sample provided by SecondLook on their Linux memory images page, and I also loaded the Jynx2 rootkit against a running netcat process in my Debian virtual machine that was running the 2.6.32-5-686 32-bit kernel.   I then acquired a memory capture of my VM using LiME.

Jynx / Jynx2
Jynx is a rootkit for Linux that is different than the previous two we have analyzed, Average Coder and KBeast, in that it only operates in userland. While Volatility’s analysis mostly focuses on kernel-level objects and activity, we still can leverage the existing functionality to deeply examine activity in userland.

LD_PRELOAD is an environment variable on Linux systems that is meant to contain the path to a shared library (.so). When set, the given shared library will load before any others, and any functions that it overrides in other libraries will be called instead of the real function. LD_PRELOAD is meant to provide an easy method to debug or inspect functions in dynamically linked libraries without the need to patch or recompile the library itself. Besides the per-process environment variable, the /etc/  file can be used to preload a library into all processes on the system by simply placing the path of the library to preload into it.

Like many other features provided by operating systems, this feature has long been abused (see here from 2005) by security researchers. Due to the ease in which LD_PRELOAD can give malware control over userland processes, many current rootkits, including Jynx, utilize this feature to gain control over the running computer.

Jynx utilizes LD_PRELOAD extensively to perform its hiding of files, processes, network connections and to provide remote access capabilities to the attacker. The functions it hooks to accomplish this are:
  •  accept
  • access
  • fxstat/fxstat64
  • lxstat/lxstat64
  • open
  •  rmdir
  •  unlink/unlinkat
  •  xstat/xstat64
  •  write
  • fopen/fopen64
  • fdopendir/opendir
  • readdir/readdir64
This wide range of functions covers everything from accepting new network connections to reading directories on the local filesystem.  Jynx utilizes these hooks to hide information based on the group ID owning the file or a custom string in a file’s path.

 To see how a typical LD_PRELOAD rootkit works, we will study the open hook of Jynx:

[1] int open (const char *pathname, int flags, mode_t mode)
  struct stat s_fstat;

  if (!libc)
[2] libc = dlopen (LIBC_PATH, RTLD_LAZY); 

  if (old_open == NULL)
[3] old_open = dlsym (libc, "open"); 

   if (old_xstat == NULL)
[4] old_xstat = dlsym (libc, "__xstat"); 

  drop_suid_shell_if_env_set ();

[5] memset (&s_fstat, 0, sizeof (stat)); 

  old_xstat (_STAT_VER, pathname, &s_fstat);
  if (s_fstat.st_gid == MAGIC_GID || (strstr (pathname, MAGIC_STRING))
      || (strstr (pathname, CONFIG_FILE))) {
    errno = ENOENT;
    return -1;

[6] return old_open (pathname, flags, mode);

At [1] we see a function named open being declared with the same prototype of the libc open. This tells us that any call into libc’s open will first be redirected to the LD_PRELOAD’d library’s version. At [2] we see dlopen being called with libc’s path as the first parameter and RTLD_LAZY as the second. dlopen is similar to LoadLibrary on Windows in that it will load the specified library into the address space of the calling process and return a handle to it for further reference. The RTLD_LAZY parameter tells dlopen not to resolve symbol addresses until they are needed.

At [3] and [4] we see dlsym being called with the libc handle returned from dlopen and being sent parameters of  open and __xstat.  dlsym is the equivalent of GetProcAddress on Windows and will return the address of the given functions. Since RTLD_LAZY was specified, this will also have the side effect of the loader resolving each function's address.

After [4], the rootkit’s injected library has control of an application that has called open and has been redirected to the rootkit’s hijack function. It also knows the address of the real open and __xstat functions. At this point, starting at [5], it uses the stat call to see if the file should be hidden based on the group ID or the pathname of the file. If the file should be hidden, errno is set to –ENOENT (No such file or directory). This effectively hides file from userland tools. If the file is not meant to be hidden, the real libc open function is called so that the operating system will behave as normal. This is performed at [6].

Remote Access
Jynx provides a listening, password protected, and SSL encrypted remote backdoor to infected computers.  Jynx implements this by hooking the accept function of libc, which is called every time a new connection is initiated to a listening server.

When a connection is started, Jynx first calls libc’s accept to process the connection. It then calls its own function to determine if the connection is from the attacker who installed the malware. The first check is on the source port of the connection. One of the compile-time configuration options for Jynx is a high and low source port for which is will accept connections. This helps to weed out random connections to a compromised server. The next part in the verification is a compile-time chosen password, the default of which is “DEFAULT_PASSWORD”. Once a connection is verified, the attacker is then spawned a remote root shell.

Hiding Connections
Jynx hides the backdoor connections by hooking the reading of /proc/net/tcp by userland applications. This file is used by lsof, netstatand other tools to list the network connections active on a computer. If you read the previously mentioned KBeast analysis article, you know that KBeast, along with many other kernel rootkits, hijack the sequence operations structure of this file to hide connections from within the kernel.

Jynx accomplishes this by hooking fopen and fopen64 and monitoring for reads of the tcp file. If the file is read, then Jynx opens a temporary file, reads the actual file in itself, and filters out connections on the hidden ports. All non-filtered connections are written to the temporary file and the calling application is returned a handle to the temporary file instead of the real /proc/net/tcp. 

Investigating Jynx with Volatility
Jynx provides a number of opportunities to both detect its presence as well as to uncover the actions taken by the attacker on the compromised computer. 

Process Mappings
The simplest method to detect Jynx is by looking for its library within processes. If we use the linux­_proc_maps plugin for all processes across the SecondLook image and then search for the presence of Jynx, we see we have a large number of hits:

# python -f jynx.mem --profile=LinuxUbuntu1204x64 linux_proc_maps > all_proc_maps

# grep -c all_proc_maps

# grep all_proc_maps | head -6
0x7fb809b61000-0x7fb809b67000 r-x          0  8: 1       655368 /XxJynx/
0x7fb809b67000-0x7fb809d66000 ---      24576  8: 1       655368 /XxJynx/
0x7fb809d66000-0x7fb809d67000 r--      20480  8: 1       655368 /XxJynx/
0x7fb809d67000-0x7fb809d68000 rw-      24576  8: 1       655368 /XxJynx/
0x7f9e75ac5000-0x7f9e75acb000 r-x          0  8: 1       655368 /XxJynx/

This shows 364 instances of Jynx-related mappings (code, data, guard pages, etc) across all active processes.  We also see the hidden directory of the rootkit /XxJynx as well as the full path to the shared library.

Network Connections
For testing on my own VM, I only infected a single netcat process with Jynx. To accomplish this, I simply ran netcat as:

# LD_PRELOAD=/root/Jynx2/jynx2/ nc –l –p 12345

This hooked all processing for netcat, including the accept function. I then connected to the backdoor as instructed by the README:

# ncat <VM IP> 12345 -p 42 –ssl

This connects to the backdoor using SSL and on one of the magic source ports (42).  I then had to enter the password (“DEFAULT_PASSWORD”) and was given a bash shell. I entered some commands for us to recover and then acquired a memory sample.

Using the linux_pstree plugin reveals the spawning of bash by netcat and the close association between the bash and nc processes.

# python -f jynx.lime --profile=Linuxthisx86 linux_pstree
.nc                  3047            0
..bash               3048            0

This shows that bash with a PID of 3048 is the child process of nc (netcat) with a PID of 3047.  If we run linux_netstat we will notice very strange output related to these processes:

# python -f jynx.lime --profile=Linuxthisx86 linux_netstat -p 3047,3048
Volatile Systems Volatility Framework 2.2_rc1
TCP LISTEN  nc/3047
TCP    ESTABLISHED     nc/3047
TCP LISTEN  bash/3048
TCP    ESTABLISHED     bash/3048

First, this output shows that two separate processes are listening on the TCP same port (12345) across all interfaces. Second, we also see that two separate processes are connected on the same IP/port pair. Neither of these situations occur with properly written network applications and on a properly configured computer.

Investigating Userland
At this point in the investigation, we have shown that Jynx has provably infected our sample image and that an active connection was established during the memory capture.  We now want to investigate the userland aspects of Jynx with as much help from Volatility as possible.

Analyzing the Heap
Since we know that the backdoor was connected to and commands run on it, we want to recover the commands from memory in order to determine steps taken by the attacker. To accomplish this, we will recover the heap from the backdoor process. The heap is the memory region where all dynamically allocated memory (malloc, realloc, calloc) of a process is placed. 

To find the heap we use the linux_proc_maps plugin and search for the heap mapping:

# python -f jynx.lime --profile=LinuxDebianx86 linux_proc_maps -p 3047 | grep heap
Volatile Systems Volatility Framework 2.2_rc1
0x95b7000-0x9672000 rw-          0  0: 0            0 [heap]

We then dump the heap using linux_dump_map:

# python -f jynx.lime --profile=LinuxDebianx86 linux_dump_map -p 3047 -s 0x95b7000 -O nc-heap
Volatile Systems Volatility Framework 2.2_rc1
Writing to file: nc-heap
Wrote 708608 bytes

We now have netcat’s heap in the nc-heap file. From here we will use primitive techniques such as strings and grep to find data of interest. Note: While we could strings and grep across the whole memory capture for the commands of interest, by focusing on just the heap of the process we care about, we have filtered our search size from 512MB to 708KB. The first command gathers ANSI strings and the next extracts Unicode strings.

# strings nc-heap >> nc-strings
# strings -e l nc-heap >> nc-strings

If we investigate the strings output, we can recover the output of all the commands entered by the user as well as the entered password:

uid=0(root) gid=0(root) groups=0(root)
root     pts/0        2012-09-12 12:20 (
root     pts/1        2012-09-12 12:20 (
 14:25:13 up  6:35,  2 users,  load average: 0.00, 0.00, 0.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    12:20   33.00s  0.40s  0.40s -bash
root     pts/1    12:20   11.00s  1.65s  1.56s -bash
root     pts/0        2012-09-12 12:20 (
root     pts/1        2012-09-12 12:20 (
Linux debian 2.6.32-5-686 #1 SMP Mon Oct 3 04:15:24 UTC 2011 i686 GNU/Linux

If we analyzed the heap from the associated bash process (PID 3048), we would see that we can recover the command invocations (uname –a, id, ls, w, who) used to generate the output in the netcat process.

Analyzing the heap of userland processes is useful in a number of situations where there is no plugin to perform structured analysis of an application or where you wish to find de-allocated structures of a process. Other common examples include analyzing the heap of browsers for URLs visited and form values entered, instant message clients for conversations, and numerous apps for passwords and cryptographic keys.

Extracting and Analyzing the Backdoor
Since there are compile time options for the backdoor and many LD_PRELOAD-based rootkits are not open source, will now extract and analyze the malicious library in a way that is applicable to all similar rootkits.  We will perform the extraction with linux_find­_file. The full path of the shared library was determined using linux_proc­_maps on the infected process.

# python -f jynx.lime --profile=LinuxDebianx86 linux_find_file -F /root/Jynx2/jynx2/
Volatile Systems Volatility Framework 2.2_rc1
Inode Number          Inode
----------------          ----------
          908201        0xdedc249c

# python -f jynx.lime --profile=LinuxDebianx86 linux_find_file -O -i 0xdedc249c

If we analyze the global variables of the shared library we get three interesting variables:

# readelf -s  | grep OBJECT | grep -v GLIBC
    93: 000065b4     4 OBJECT  GLOBAL DEFAULT   23 libc
   101: 000065ac     4 OBJECT  GLOBAL DEFAULT   23 ssl
   117: 000065b0     4 OBJECT  GLOBAL DEFAULT   23 ctx

libc is the pointer returned from dlopen, while ssl and ctx are the SSL variables used to store the keys for the network communications of the backdoor. One may think that these could be used to decrypt a packet capture of the backdoor’s activity…

If we analyze the other global variables we see the names of the functions that get hooked by the application:

    39: 00006560     4 OBJECT  LOCAL  DEFAULT   23 old_accept
    40: 00006564     4 OBJECT  LOCAL  DEFAULT   23 old_access
    41: 00006568     4 OBJECT  LOCAL  DEFAULT   23 old_fxstat
    42: 0000656c     4 OBJECT  LOCAL  DEFAULT   23 old_fxstat64
    43: 00006570     4 OBJECT  LOCAL  DEFAULT   23 old_lxstat
    44: 00006574     4 OBJECT  LOCAL  DEFAULT   23 old_lxstat64
    45: 00006578     4 OBJECT  LOCAL  DEFAULT   23 old_open
    46: 0000657c     4 OBJECT  LOCAL  DEFAULT   23 old_rmdir
    47: 00006580     4 OBJECT  LOCAL  DEFAULT   23 old_unlink
    48: 00006584     4 OBJECT  LOCAL  DEFAULT   23 old_unlinkat
    49: 00006588     4 OBJECT  LOCAL  DEFAULT   23 old_xstat
    50: 0000658c     4 OBJECT  LOCAL  DEFAULT   23 old_xstat64
    51: 00006590     4 OBJECT  LOCAL  DEFAULT   23 old_write
    52: 00006594     4 OBJECT  LOCAL  DEFAULT   23 old_fopen
    53: 00006598     4 OBJECT  LOCAL  DEFAULT   23 old_fopen64
    54: 0000659c     4 OBJECT  LOCAL  DEFAULT   23 old_fdopendir
    55: 000065a0     4 OBJECT  LOCAL  DEFAULT   23 old_opendir
    56: 000065a4     4 OBJECT  LOCAL  DEFAULT   23 old_readdir
    57: 000065a8     4 OBJECT  LOCAL  DEFAULT   23 old_readdir64

These variables named old_ are the storage of the address of hooked functions so that the original can be called. A similar approach is used by nearly every LD_PRELOAD-based rootkit.

Now let us imagine we did not have source to the binary and the binary was stripped (no symbols). We could then determine which functions the library hooks by debugging a hooked application and filtering on calls to dlsym. To do this we use gdb:

# gdb -q `which nc`
Reading symbols from /bin/nc...(no debugging symbols found)...done.
(gdb) b dlsym
Function "dlsym" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dlsym) pending.
(gdb) set environment LD_PRELOAD=/root/Jynx2/jynx2/

So far we have loaded netcat into gdb, set a breakpoint on dlsym (remember this is the function called to locate the address of hooked functions), and set LD_PRELOAD in the debugging environment.  When we run the program, netcat will be hooked and any call to dlsym will return control to us in the debugger:

(gdb) r -l -p 12345
Starting program: /bin/nc -l -p 12345

Breakpoint 1, 0xb7e89cf6 in dlsym () from /lib/i686/cmov/

Immediately after running the program, the breakpoint is hit. We can now examine what triggered the breakpoint:

(gdb) bt
#0  0xb7e89cf6 in dlsym () from /lib/i686/cmov/
#1  0xb7fdcce2 in fopen () from /root/Jynx2/jynx2/
#2  0xb7b84e40 in ?? () from /lib/i686/cmov/
#3  0xb7b853ad in _nss_files_getservbyport_r () from /lib/i686/cmov/
#4  0xb7f7306c in getservbyport_r () from /lib/i686/cmov/
#5  0xb7f72ec6 in getservbyport () from /lib/i686/cmov/
#6  0x080494d7 in ?? ()
#7  0x0804b67b in ?? ()
#8  0xb7ea4c76 in __libc_start_main () from /lib/i686/cmov/
#9  0x08048f71 in ?? ()

By examining the backtrace, we see that fopen was called inside of Jynx. We then see the call to dlsym.  We can verify that fopen is the function being hooked by looking at the parameter sent to dlopen:

(gdb) x/s *(unsigned int)($ebp+12)
0xb7fde77f:      "fopen"

We now know for certain that fopen is one of the hooked functions. This approach to enumerating all the hooked functions, while valid, is very manual and cumbersome. Instead, we want to automate the process using gdb breakpoint commands, like so:

(gdb) info breakpoints [1]
Num     Type           Disp Enb Address    What
1       breakpoint     keep y   0xb7e89cf6 <dlsym+6>
        breakpoint already hit 1 time
(gdb) commands 1 [2]
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>x/s *(unsigned int)($ebp+12)
What this sequence of commands does is list our breakpoints at [1], in this case we only have one set. At [2] we instruct gdb to run a command every time this breakpoint is hit.  This command simply prints out the function that is being hooked and uses cont to continue program execution without our interaction. Now as the program runs we will automatically be given a printout of every function that is hooked. This can be seen in the sample output below, which was generated by just running the program and then connecting to the backdoor:

Starting program: /bin/nc -l -p 12345
0xb7fde77f:      "fopen"
0xb7fde7e5:      "__xstat"
0xb7fde7d7:      "accept"
0xb7fde84c:      "write"
0xb7fde843:      "__fxstat"

As we interact further with the backdoor, such as by listing directories, we will see more function names printed out.

We have analyzed Jynx using Volatility and gdb. Along the way we have discussed a reusable methodology for analyzing LD_PRELOAD based malware.

In tomorrow’s posts we will discuss analyzing network information with Volatility.  If you have any questions or comments please use the comment section of the blog or you can find me on Twitter (@attrc).

No comments:

Post a Comment