Friday, September 14, 2012

MoVP 1.5 KBeast Rootkit, Detecting Hidden Modules, and sysfs

Month of Volatility Plugins

In this post I will analyze the KBeast rootkit using Volatility’s new Linux features.  This will include finding hidden modules, network connections, opened files, and hooked system calls.

If you would like to follow along or recreate the steps taken, please see the LinuxForensicsWiki  for instructions on how to do so.

Obtaining the Samples

To have a sample to test against I installed the KBeast rootkit in my Debian virtual machine that was running the 2.6.26-2-686 32-bit kernel.


KBeast is a kernel mode rootkit that loads as a kernel module. It also has a userland component that provides remote access to the computer. This userland backdoor is hidden from other userland applications by the kernel module.  KBeast also hides files, directories, and processes that start with a user defined prefix. Keylogging abilities are also optionally provided.

KBeast gains its control over a computer by hooking the system call table and by hooking the operations structures used to implement the netstat interface to userland.

We will know go through each piece of functionality the rootkit offers, how it accomplishes it, and how we can detect it with Volatility. 

Hiding the Kernel Module

Effect on Forensics
Rootkits hide themselves from the module list as any unknown modules will be very noticeable to IT security staff as well as to integrity verifiers that operate in userland.  The inability to locate hidden modules can give investigators a false sense of security and make them trust the output of tools on a live machine that they should not.

How KBeast Accomplishes it
To hide its kernel module component, KBeast uses the same technique that many other modules do, which is breaking itself from the linked list of loaded kernel modules. This list is exported through /proc/modules and the lsmod binary reads this file to list the loaded modules of a system. This has the effect of the module still being active in memory, but not detectable with lsmod or from kernel tools that simply walk the linked list.

How Volatility Detects This
Volatility leverages sysfs to find modules that are removed from the modules list but still active. sysfs is a kernel to userland interface, similar to /proc, that exports a wide range of kernel information and statistics. One of these types of data is the loaded modules and their associated information such as parameters, sections, and reference counts. On a running system, this information is exported through the /sys/module directory. 

Inside of this directory, there is one directory per-kernel module, and the directory is named the same as the module appears in lsmod.  The per-module sub-directories contain more sub-directories that hold the parameters, sections, and other module data. The following shows reading of the parameters sent to LiME to obtain the memory capture for this blog post (the original command was insmod lime.ko "path=kbeast.this format=lime")

# cat /sys/module/lime/parameters/path

# cat /sys/module/lime/parameters/format

The linux_check_modules plugin finds hidden modules by walking the linked list of modules as well as enumerating all the directories under /sys/module. These two lists are then compared and any entries that are only found in sysfs are reported as hidden kernel modules. We have yet to find a rootkit that hides from sysfs at all, so this method has worked well across a number of malware samples.  The following shows this plugin against KBeast:

# python -f kbeast.this --profile=LinuxDebianx86 linux_check_modules
Volatile Systems Volatility Framework 2.2_rc1
Module Name

As can be seen, the KBeast module is detected as hidden. 

The sysfs enumeration code works by finding the module_kset variable, of type kset, that holds all information for /sys/module. The plugin then walks each member of the kset’s entry list which is of type kobject. Each of these kobject structures represents a module and its subdirectory immediately under /sys/module. The names of these directories are then gathered to be compared with the module list names.

Hooking System Call Table

Effect on Forensics
System calls are the main mechanism for userland code to trigger event handling by the kernel. Reading and writing files, sending network data, spawning and exiting processes, etc are all done through system calls.  The system call table is an array of function pointers, in which each pointer corresponds to a system call handler (i.e. sys_read handles the read system call). 

Rootkits often target this table due to the power it gives them over the control flow of the running kernel.  KBeast hooks a number of entries in this table in order to hide files, processes, and more.

How KBeast Accomplishes it
During the initialization of its kernel module, KBeast hooks the unlink, rmdir, unlinkat, rename, open, kill, read, write, getdents, and delete_module system calls with its own handlers. These handlers ensure that files and processes that start with the user-supplied prefix are hidden and that they cannot be tampered with.

The overwritten kill system call handler also acts as the mechanism that the rootkit provides in order for userland processes to elevate privileges. All a userland process has to do is send a signal with the backdoor signal value and the process will be elevated. If you read our post yesterday, you know that the Average Coder rootkit used a mechanism that allowed us to detect elevated processes. Unfortunately, KBeast does not use this mechanism and instead uses the proper interfaces provided by the kernel, namely prepare_creds and commit_creds. This mechanism does not produce any inconsistencies, so we cannot immediately find processes elevated by KBeast.

How Volatility Detects This
Volatility detects all of these hooks by enumerating and verifying each entry in the system call table. This is implemented in the linux_check_syscall plugin, which, for every member of the system call table, either prints out the symbol name or, if it is hooked, prints out the hook address. Since there is anywhere from 300 to 400+ system calls on normal Linux system, it is advisable to redirect the plugin output to a file and then grep for bad entries as shown here:

# python -f kbeast.lime --profile=LinuxDebianx86 linux_check_syscall > ksyscall

# head -10 ksyscall
Table Name      Index Address    Symbol
---------- ---------- ---------- ------------------------------
32bit             0x0 0xc103ba61 sys_restart_syscall
32bit             0x1 0xc103396b sys_exit
32bit             0x2 0xc100333c ptregs_fork
32bit             0x3 0xe0fb46b9 HOOKED
32bit             0x4 0xe0fb4c56 HOOKED
32bit             0x5 0xe0fb4fad HOOKED
32bit             0x6 0xc10b1b16 sys_close
32bit             0x7 0xc10331c0 sys_waitpid

# grep HOOKED ksyscall
32bit             0x3 0xe0fb46b9 HOOKED
32bit             0x4 0xe0fb4c56 HOOKED
32bit             0x5 0xe0fb4fad HOOKED
32bit             0xa 0xe0fb4d30 HOOKED
32bit            0x25 0xe0fb4412 HOOKED
32bit            0x26 0xe0fb4ebd HOOKED
32bit            0x28 0xe0fb4db1 HOOKED
32bit            0x81 0xe0fb5044 HOOKED
32bit            0xdc 0xe0fb4b9e HOOKED
32bit           0x12d 0xe0fb4e32 HOOKED

We can see in the first output what some clean entries look like and that the system call table index is reported along with the symbol name and address. For hooked entries, we instead see HOOKED in place of a symbol name because the hooked function points to an unknown address (in this case inside the rootkit’s module).

The plugin only prints the index of the system call entries and not a name because the system call table varies widely across distributions and kernel versions, and determining the name of each one requires the debug build of the kernel (vmlinux).  This may be incorporated into future versions of the plugins, but will require additions to the current code base, and in many cases the debug build is not made available by the distribution package maintainers.

Hiding Network Connections

Effect on Forensics
The ability to hide network connections from userland frustrates not only host investigators, but also network forensics teams who wish to tie traffic back to a specific computer.  The ease in which kernel modules can hide information from userland makes a strong case for all incident response to be based on offline memory captures and not on the output from tools running on the live system.

How KBeast Accomplishes it
To hide network connections from netstat and the userland interfaces it uses, KBeast hooks the show member of the tcp4_seq_afinfo sequence operation structure.  This structure is of type tcp_seq_afinfo and has members of type file_operations and of type seq_operations.  Please refer to yesterday’s blog post to learn about file_operations structures. Sequence operation structures provide a generic mechanism to display information inside of the /proc filesystem. This structure has the members start, show, next, stop, and the wrapping code provides handling of partial seeks, buffered reads, and other complicated logic so that it only has to be implemented once throughout the entire kernel.

Sequence operations structures are often targeted by malware because they directly affect what is populated in /proc. By overwriting the show member of such a structure, a rootkit can easily filter out entries it does not want to appear in userland.  KBeast effectively hides its backdoored network connection by filtering the show member of the TCP4 structure.  This technique is also used by many other rootkits.

How Volatility Detects This    
To detect KBeast’s overwriting of network sequence operation structures, the linux_check_afinfo plugin walks the file_operations and sequence_operations structures of all UDP and TCP protocol structures including, tcp6_seq_afinfo, tcp4_seq_afinfo, udplite6_seq_afinfo, udp6_seq_afinfo, udplite4_seq_afinfo, and udp4_seq_afinfo, and verifies each member. This effectively detects any tampering with the interesting members of these structures. The following output shows this plugin against the VM infected with KBeast:

# python -f  kbeast.lime --profile=LinuxDebianx86 linux_check_afinfo
Volatile Systems Volatility Framework 2.2_rc1
Symbol Name        Member          Address
-----------        ------          ----------
tcp4_seq_afinfo    show            0xe0fb9965

This plugin reports and verifies that the show member is indeed hooked and that the system is compromised.

Analyzing the Userland Backdoor

Effect on Forensics
The kernel module provides cover for the attacker by hiding any process or files that start with a user-defined prefix or any network connection on a specified port. By default, the prefix is set to “_h4x_”, but the rootkit’s README recommends changing it to something that is not so simple to grep for. For this demo, I just left it as the default. The port number to hide is also a compile time configuration option chosen by the user.

The userland backdoor consists of a simple application that listens on the hidden network port, requires a password, and then spawns a bash shell with the privileges of root if the password is correct.

How KBeast Accomplishes it
As stated in the section on Hooking the System Call table, these userland activities are hidden by hooking the system call table and the sequence operations structure of TCP. Once connected to the backdoor, the attacker can perform a wide range of attacks and post-compromise activity. We will know focus on recovering this activity.

How Volatility Detects This
Fortunately for Volatility’s users, particularly those with a baseline of the system they are analyzing or a copy of ps output from the infected system, finding the hidden process is trivial. The output of linux_pslist can simply be compared with that of the baseline or ps. Since KBeast hides processes by hooking the system call table, the process list is untouched and the hidden process will be in Volatility’s output but not the others. In the case of my infected image the _h4x_bd process has a PID of 2777:

# python --profile=LinuxDebianx86 -f kbeast.lime linux_pslist -p 2777
Volatile Systems Volatility Framework 2.2_rc1
Offset     Name      Pid    Uid Start Time
---------- ----- -------    --- ----------
0xdf4cd5a0 _h4x_bd   2777   0   Wed, 12 Sep 2012 20:49:25 +0000

Since we know the PID is 2777, we can then investigate the rest of the application’s activities using Volatility. First, we want to determine if any processes have the backdoor as a parent process. We can use the linux_pstree plugin to determine this and it will show us what programs were executed by the backdoor:

# python --profile=LinuxDebianx86 -f kbeast.lime linux_pstree
Volatile Systems Volatility Framework 2.2_rc1
Name                 Pid             Uid
._h4x_bd             2777            0
..bash               3053                0
...sleep             3077                0

This plugin lists the parent/child relationship between processes by adding a ‘.’ for each depth in the hierarchy. The displayed portions of the output show us that the backdoor is active with a spawned bash shell and that this shell ran the sleep command.  We can then use the linux_psaux plugin to display the command line arguments of each of these processes and their start time:

# python --profile=LinuxDebianx86 -f kbeast.lime linux_psaux -p 2777,3053,3077
Volatile Systems Volatility Framework 2.2_rc1
Pid    Uid    Arguments
2777   2      ./_h4x_bd           Wed, 12 Sep 2012 20:49:25 +0000
3053   2      bash -i             Thu, 13 Sep 2012 01:00:31 +0000
3077   2      sleep 100           Thu, 13 Sep 2012 01:02:22 +0000

In this output we can see that bash was run in interactive mode and that sleep was passed a parameter of 100.  In a real incident response situation, this can determine what parameters were sent to a wide range of tools used during post-compromise activity.   

Now that we know a connection was active to the backdoor at the time of the compromise, we want to recover the network connections associated with it. We can use the linux_netstat plugin with the backdoor’s PID to accomplish this:

# python --profile=LinuxDebianx86 -f kbeast.lime linux_netstat -p 2777
Volatile Systems Volatility Framework 2.2_rc1
TCP CLOSE_WAIT           _h4x_bd/2777
TCP             LISTEN                       _h4x_bd/2777
TCP ESTABLISHED           _h4x_bd/2777

This shows us that the backdoor is listening on port 13377 and that there is an active connection from  on port 41745. We also see a previous connection in the CLOSE_WAIT  state on port 41744.   As we will see in a future blog post on recovering network data, we could attempt to recover the packets associated with these connections by using the linux_sk_buff_cache and linux_pkt_queues plugins.  Having the IP address and port pairs also allows us to focus network forensics investigations on only the streams associated with the communication channels of the malware. 

At this point, we have found the processes and network activity associated with the backdoor, all of which would be hidden from us on a live system, and are able to dig deep into the workings of the process. Now our goal is to discover the hidden directory that the backdoor is placed in as the keylogging file is stored in the same directory.  We can use linux_proc_map for this:

# python --profile=LinuxDebianx86 -f kbeast.lime linux_proc_maps -p 2777
Volatile Systems Volatility Framework 2.2_rc1
0x8048000-0x8049000 r-x          0  8: 1       301353 /usr/_h4x_/_h4x_bd
0x8049000-0x804a000 rw-       4096  8: 1       301353 /usr/_h4x_/_h4x_bd
0xb75d7000-0xb75d8000 rw-          0  0: 0            0
0xb75d8000-0xb772d000 r-x          0  8: 1       513087 /lib/i686/cmov/
0xb772d000-0xb772e000 r--    1396736  8: 1       513087 /lib/i686/cmov/
0xb772e000-0xb7730000 rw-    1400832  8: 1       513087 /lib/i686/cmov/
0xb7730000-0xb7733000 rw-          0  0: 0            0
0xb7739000-0xb773b000 rw-          0  0: 0            0
0xb773b000-0xb773c000 r-x          0  0: 0            0
0xb773c000-0xb7756000 r-x          0  8: 1       505267 /lib/
0xb7756000-0xb7758000 rw-     106496  8: 1       505267 /lib/
0xbf81b000-0xbf831000 rw-          0  0: 0            0 [stack]

And by looking at the mapping starting at 0x8048000, we see that our backdoor binary is loaded at that address and that its full path is /usr/_h4x_/_h4x_bd. Since the directory name has the hidden prefix, this directory would not show on a live machine, and we would have to analyze a disk image to find it. Timelining would be a good method to narrow down the results quickly.

We can partially recover the backdoor binary by using the linux_dump_map command:

# python --profile=LinuxDebianx86 -f kbeast.lime linux_dump_map -p 2777 -s 0x8048000 -O h4x­bd

This invocation focuses on PID 2777 (the network backdoor) and tells the plugin to write the mapping to the h4xbd file. This will only partially recover the file though as the binary is not loaded directly from disk into the process’s memory and instead its sections are spread throughout the address space. We can verify this with the file and readelf commands:

# file h4xbd
bin22: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked (uses shared libs), stripped
# readelf -s h4xbd
readelf: Error: Unable to read in 0x28 bytes of section headers
readelf: Error: Unable to read in 0x5a0 bytes of section headers
readelf: Error: Unable to read in 0xd0 bytes of dynamic section

Note that the file command see it as an ELF file, but readelf is unable to process the file. To recover the file in-tact, we need to acquire it from the page cache using the linux­_find_file plugin. This is because the page cache holds all the physical pages backing a file in memory without any modifications.

# python --profile=LinuxDebianx86 -f kbeast.lime linux_find_file -F "/usr/_h4x_/_h4x_bd"
Volatile Systems Volatility Framework 2.2_rc1
Inode Number          Inode
---------------- ----------
          301353 0xd606ea70

We then recover the file with another invocation of linux_find_file:

# python --profile=LinuxDebianx86 -f kbeast.lime linux_find_file -i 0xd606ea70 -O h4xbd

Now when we run readelf, we get much better results:

# readelf -s h4xbd | head -15
Symbol table '.dynsym' contains 25 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000   220 FUNC    GLOBAL DEFAULT  UND signal@GLIBC_2.0 (2)
     2: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 00000000   112 FUNC    GLOBAL DEFAULT  UND write@GLIBC_2.0 (2)
     4: 00000000    55 FUNC    GLOBAL DEFAULT  UND listen@GLIBC_2.0 (2)
     5: 00000000    44 FUNC    GLOBAL DEFAULT  UND setsid@GLIBC_2.0 (2)
     6: 00000000   441 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.0 (2)
     7: 00000000    14 FUNC    GLOBAL DEFAULT  UND htons@GLIBC_2.0 (2)
     8: 00000000   112 FUNC    GLOBAL DEFAULT  UND read@GLIBC_2.0 (2)
     9: 00000000   210 FUNC    GLOBAL DEFAULT  UND perror@GLIBC_2.0 (2)
    10: 00000000   108 FUNC    GLOBAL DEFAULT  UND accept@GLIBC_2.0 (2)
    11: 00000000    55 FUNC    GLOBAL DEFAULT  UND socket@GLIBC_2.0 (2)

# readelf -s h4xbd | wc -l

Which shows us that the symbol table is in-tact and that 131 symbols were present. (Thanks to the malware author for not stripping his bins ;). In fact, if we hash the recovered binary from memory and the backdoor binary on the infected VM, the hashes will match exactly. 

As a final step, we will quickly perform binary analysis of the binary recovered from memory. Since the password, hidden port, secret signal number, etc are all compile time options, they will be different per instance of the sample, but can be recovered with simple reverse engineering. To start this process, we find symbols from the binary that may be interesting, by using nm and filtering for functions (code).

# nm h4xbd | grep -wi "t"
08048b70 t __do_global_ctors_aux
08048770 t __do_global_dtors_aux
08048b6a T __i686.get_pc_thunk.bx
08048b00 T __libc_csu_fini
08048b10 T __libc_csu_init
08048b9c T _fini
08048584 T _init
08048740 T _start
08048906 T bindshell
0804881d T enterpass
080487f4 T error_ret
080487d0 t frame_dummy
08048ace T main

From this output, the functions bindshell and enterpass look interesting. If we load the binary into gdb and disassemble this function we notice a few things:

# gdb -q h4xbd
Reading symbols from /root/h4xbd...done.
(gdb) set disassembly-flavor intel
(gdb) disassemble enterpass
Dump of assembler code for function enterpass:
0x0804881d <enterpass+0>:       push   ebp
0x0804881e <enterpass+1>:       mov    ebp,esp
0x08048820 <enterpass+3>:       sub    esp,0x68
0x08048823 <enterpass+6>:       mov    DWORD PTR [ebp-0x8],0x8048ea8 <--- banner string
0x0804882a <enterpass+13>:      mov    DWORD PTR [ebp-0x4],0x8048ec9 <--- another banner string
0x08048892 <enterpass+117>:     mov    DWORD PTR [esp+0x8],0x5
0x0804889a <enterpass+125>:     mov    DWORD PTR [esp+0x4],0x8048ee6 <---- hardcoded address of password
0x080488a2 <enterpass+133>:     lea    eax,[ebp-0x48]
0x080488a5 <enterpass+136>:     mov    DWORD PTR [esp],eax
0x080488a8 <enterpass+139>:     call   0x8048714 <strncmp@plt> <---- strncmp call

What becomes immediately apparent is that we have a call to strncmp at 0x080488a8, which is likely where the password is check is contained, and that we see other hardcoded strings in the address range of 0x8048eXX. At address 0x0804889a, we can see one of these strings being placed on the stack as a parameter to the check string call. If we investigate these addresses, we see that the password (“h4x3d”) is contained in cleartext and that the other strings in the same memory region contain the backdoor’s login banner, debug information, the hidden directory /usr/_h4x_, and other interesting information.

(gdb) x/s 0x8048ee6
0x8048ee6:       "h4x3d"

(gdb) x/30s 0x8048e00
0x8048e7d:       ""
0x8048e7e:       ""
0x8048e7f:       ""
0x8048e80:       "ERROR! Error occured on your system!"
0x8048ea5:       ""
0x8048ea6:       ""
0x8048ea7:       ""
0x8048ea8:       "Password [displayed to screen]: "
0x8048ec9:       "<< Welcome To The Server >>\n"
0x8048ee6:       "h4x3d"
0x8048eec:       "Wrong!\n"
0x8048ef4:       "socket"
0x8048efb:       "bind"
0x8048f00:       "listen"
0x8048f07:       "Daemon running with PID = %i\n"
0x8048f25:       "/usr/_h4x_"
0x8048f30:       "/bin/bash"

If we analyze the bindshell function, we find more configuration information about the particular KBeast instance:

(gdb) disassemble bindshell
Dump of assembler code for function bindshell:
0x08048906 <bindshell+0>:       push   ebp
0x08048907 <bindshell+1>:       mov    ebp,esp
0x08048909 <bindshell+3>:       sub    esp,0x58
0x0804890c <bindshell+6>:       mov    WORD PTR [ebp-0x24],0x2
0x08048912 <bindshell+12>:      mov    DWORD PTR [esp],0x3441 <-- the backdoor port
0x08048919 <bindshell+19>:      call   0x8048624 <htons@plt>
0x080489d9 <bindshell+211>:     mov    DWORD PTR [esp],0x8048f25 <- the hidden directory
0x080489e0 <bindshell+218>:     call   0x80486b4 <chdir@plt>
(gdb) x/s 0x8048f25
0x8048f25:       "/usr/_h4x_"

At this point we have done a fairly thorough job of analyzing the rootkit and can perform very effective analysis against it.  If needed, we could even write Volatility plugins to automatically recover the configuration parameters directly from memory.


We have thoroughly investigated the KBeast rootkit, including its internals, artifacts left on a system, and interactions with the attackers who place it on a system.  This includes hooking the system call table, overwriting network operation structures, and allowing “stealth” access to the compromised computer over the network.

In next week’s Linux posts, we will analyze another rootkit, Jynx, which requires more plugins to analyze, and we will have a blog post on analyzing network information with Volatility.  If you have any questions or comments please use the comment section of the blog or you can find me on Twitter (@attrc).

No comments:

Post a Comment