Thursday, July 16, 2015

The 2015 Volatility Plugin contest is now live!

This is a quick update to announce that the 2015 Volatility Plugin contest is now live and accepting submissions until October 1st. Winners of this year's contest will be receiving over $2,000 in cash prizes as well as plenty of Volatility swag (t-shirts, stickers, etc.).

The purpose of the contest is to encourage open memory forensics research and development. It is a great opportunity for students to become familiar with memory forensics, develop a master's thesis or PhD project, as well as gain experience that will be very desirable by future employers. For those already in the field, submitting to the contest is a great way to gain experience and visibility in the memory forensics community. After the contest is over we promote the work in our conference presentations, blogs, and social media.

If you are looking for inspiration or to see the past winners, please check out the pages from 2013 and 2014. You will find projects that allow for inspection of virtual machines guests from the view of the host, recovery of in-memory browser artifacts, methods to detect stealthy rootkits, and much more.

If you have any questions please feel free to reach out to us. 

We are looking forward to another year of innovative open source research!

Monday, July 13, 2015

Volatility at Black Hat USA & DFRWS 2015!

Due to another year of open research and giving back to the open source community, Volatility will have a strong presence at both Black Hat USA and DFRWS 2015. This includes presentations, a book signing, and even a party!

At Black Hat, the core Volatility Developers (@4tphi, @attrc, @gleeda, and @iMHLv2) will be partaking in a number of events including:
  • Demoing Volatility at Black Hat Arsenal. This will include new plugins targeted at the PlugX malware, showing how to write simple, but effective Volatility plugins, and more!
  • Book signing for The Art of Memory Forensics starting at 11:10AM on Wednesday in the Black Hat book store. All four authors will be present, so be sure to bring your book along or purchase a copy on-site in the bookstore.
  • Volatility Happy Hour: This will be an open bar party where you can meet our team, bring books to be signed, and get Volatility swag all while enjoying tasty beverages. You must register (free) if you wish to attend! Note that the party will be at MGM Grand.
Friends of Volatility will also be leading a number of events at Black Hat including Briefing presentations from Jonathan Brossardjduck, and Alex Ionescu as well as Arsenal Demos from Brian Baskin, Marc Ochsenmeier, David Cowen, and Takahiro Haruyama.

At DFRWS, Dr. Golden Richard will be presenting a paper that he and I wrote: Advancing Mac OS X Rootkit Detection. In this paper, we present several new methods to detect rootkits on OS X systems through memory forensic analysis. All the of the plugins described in the paper will be incorporated into Volatility after the conference. 

Also, at DFRWS, Joe Sylve and Vico Marziale will be leading a workshop on creating forensics tools in Go. If you have never seen Go before, or want to gain some hands-on experience, then we recommend checking it out.

And finally, be sure to check out the "Finding your naughty BITS" presentation by Matthew Geiger, who has been a long time friend of the project.

We hope to see everyone at these events, and we are looking forward to an exciting August! 

Wednesday, June 3, 2015

Volshell Quickie: The Case of the Missing Unicode Characters

The other day someone reached out to me because they had a case that involved files with Arabic names.  Unfortunately the filenames were only question marks when using filescan or handles, so I set out to figure out why.

In order to figure out why, I created a few files with Hebrew names (which I can read and write, so I can verify if it is correct) and Arabic names (which was just me tapping on the keyboard, so they don't say anything). After creating them, I interacted with them to make sure they'd show up in filescan. Below you can see the filescan results:
[snip] $ python vol.py -f Win7x86.vmem --profile=Win7SP1x86 filescan 0x000000003d7008d0 16 0 RW-rw- \Device\HarddiskVolume2\Users\user\Desktop\????.txt 0x000000003ddfef20 18 1 RW-r-- \Device\HarddiskVolume2\Windows\Tasks\SCHEDLGU.TXT 0x000000003def9340 16 0 RW-r-- \Device\HarddiskVolume2\Users\user\Desktop\????????????????????.txt [snip]
In order to understand how the filename is output, you can look at the code in filescan:
for file in data: header = file.get_object_header() self.table_row(outfd, file.obj_offset, header.PointerCount, header.HandleCount, file.access_string(), str(file.file_name_with_device() or ''))
The function file_name_with_device() is defined in volatility/plugins/overlays/windows/windows.py and the relevant part is highlighted in red:
class _FILE_OBJECT(obj.CType, ExecutiveObjectMixin): """Class for file objects""" def file_name_with_device(self): """Return the name of the file, prefixed with the name of the device object to which the file belongs""" name = "" if self.DeviceObject: object_hdr = obj.Object("_OBJECT_HEADER", self.DeviceObject - self.obj_vm.profile.get_obj_offset("_OBJECT_HEADER", "Body"), self.obj_native_vm) if object_hdr: name = "\\Device\\{0}".format(str(object_hdr.NameInfo.Name or '')) if self.FileName: name += str(self.FileName) return name
So we can take a look at this in volshell:
1 $ python vol.py -f Win7x86.vmem --profile=Win7SP1x86 volshell 2 [snip] 3 >>> file = obj.Object("_FILE_OBJECT", offset = 0x000000003d7008d0, vm = addrspace().base, native_vm = addrspace()) 4 >>> print file.FileName 5 \Users\user\Desktop\????.txt 6 7 >>> file2 = obj.Object("_FILE_OBJECT", offset = 0x000000003def9340, vm = addrspace().base, native_vm = addrspace()) 8 >>> print file2.FileName 9 \Users\user\Desktop\????????????????????.txt
On line 3 we create a _FILE_OBJECT object. We know the offset where this object resides (0x000000003d7008d0) from filescan. Since this object was obtained from the physical address space, we specify this by setting vm to addrspace().base, or the physical layer, since this is a raw memory sample. Since the _FILE_OBJECT's native address space is virtual, we specify this as well: native_vm = addrspace(). At this point we have instantiated a _FILE_OBJECT in a variable called "file". We then print out the FileName member on line 5 and see its output on line 6. We follow the same process for the second file, except we save the object as "file2". As you can see, the output is not very helpful. So now we need to know what type of member, FileName is. In order to accomplish this, we need to look at the vtypes in the volatility/plugins/overlays/windows/win7_sp1_x86_vtypes.py file:
'_FILE_OBJECT' : [ 0x80, { 'Type' : [ 0x0, ['short']], [snip] 'FileName' : [ 0x30, ['_UNICODE_STRING']], [snip]
We have some functionality added to this type in volatility/plugins/overlays/windows/windows.py:
class _UNICODE_STRING(obj.CType): [snip] def v(self): """ If the claimed length of the string is acceptable, return a unicode string. Otherwise, return a NoneObject. """ data = self.dereference() if data: return unicode(data) return data def dereference(self): length = self.Length.v() if length > 0 and length <= 1024: data = self.Buffer.dereference_as('String', encoding = 'utf16', length = length) return data else: return obj.NoneObject("Buffer length {0} for _UNICODE_STRING not within bounds".format(length)) [snip] def __format__(self, formatspec): return format(self.v(), formatspec) def __str__(self): return str(self.dereference()) [snip]
We know that the file_name_with_device() function uses str() in order to transform the _UNICODE_STRING into something readable and if we look at the overridden __str__() operator in the above code, we see that it uses the dereference() function. The dereference() function never casts the data as unicode, however, so the data is printed incorrectly. If we look at the above v() function, we see that there is a call to dereference() and that the resulting data is case as unicode, so let's see if we get valid data back by calling that function instead:
>>> print file.FileName.v() \Users\user\Desktop\שלום.txt >>> print file2.FileName.v() \Users\user\Desktop\تهحححتهحححتهحححتهححح.txt
Success! So let's modify the __str__() operator to use v() instead and see if that fixes filescan:
[snip] def __str__(self): return str(self.v()) [snip]
Now let's examine the filescan data:
0x000000003d7008d0 16 0 RW-rw- \Device\HarddiskVolume2\Users\user\Desktop\שלום.txt 0x000000003ddfef20 18 1 RW-r-- \Device\HarddiskVolume2\Windows\Tasks\SCHEDLGU.TXT 0x000000003def9340 16 0 RW-r-- \Device\HarddiskVolume2\Users\user\Desktop\تهحححتهحححتهحححتهححح.txt
Success! Just to make dually sure, I then created a user with a Hebrew name: גלידה, and created some files with Hebrew characters as well. If we look back to the file_name_with_device() function, you'll see that the complete file path is populated by using the NameInfo optional header (name = "\\Device\\{0}".format(str(object_hdr.NameInfo.Name or ''))). If you look in the volatility/plugins/overlays/windows/win7.py file, you'll see the following definition:
class _OBJECT_HEADER(windows._OBJECT_HEADER): [snip] optional_header_mask = (('CreatorInfo', '_OBJECT_HEADER_CREATOR_INFO', 0x01), ('NameInfo', '_OBJECT_HEADER_NAME_INFO', 0x02), ('HandleInfo', '_OBJECT_HEADER_HANDLE_INFO', 0x04), ('QuotaInfo', '_OBJECT_HEADER_QUOTA_INFO', 0x08), ('ProcessInfo', '_OBJECT_HEADER_PROCESS_INFO', 0x10))
Now we know the object to find in the types file in order to figure out what type Name is. Look in volatility/plugins/overlays/windows/win7_sp1_x86_vtypes.py:
'_OBJECT_HEADER_NAME_INFO' : [ 0x10, { 'Directory' : [ 0x0, ['pointer', ['_OBJECT_DIRECTORY']]], 'Name' : [ 0x4, ['_UNICODE_STRING']], 'ReferenceCount' : [ 0xc, ['long']], } ],
Since Name is of the same type (_UNICODE_STRING), we should be covered. In the words of Al Bundy, "Let's rock":
$ python vol.py -f Win7x86.vmem --profile=Win7SP1x86 filescan [snip] 0x000000003e8fb1c0 2 0 RW-rw- \Device\HarddiskVolume1\Users\גלידה\Desktop\עצם.txt 0x000000003f83c038 2 0 RW-rw- \Device\HarddiskVolume1\Users\גלידה\AppData\Roaming\Microsoft\Windows\Recent\עצם.lnk [snip]
Success! A lot of other objects use _UNICODE_STRINGs, including mutants, registry paths and symbolic links. So this was an important fix.

Changes have already been reflected in the master branch of Volatility. We hope that you have enjoyed this not so short, quickie ;-)

Friday, May 15, 2015

Using mprotect(.., .., PROT_NONE) on Linux

After deciding to revisit some old code of mine (ok, very old), I realized that there was something different about how Linux was allocating pages of data I wanted to hide.   At first, I was glad that I couldn't see the data using yarascan, but then I realized that I was unable to access the memory regions at all in linux_volshell to verify that they were, in fact, obfuscated. So I decided to take a look at using a smaller, stripped down program. Below is one such example, comments are included to explain what is happening:

int main( int argc, char *argv[]){ // pid: the process ID of this process // so we can print it out int pid; pid = getpid(); //size: an integer to hold the current page size int size; size = getpagesize(); //[1] create two pointers in order to allocate //memory regions char *buffer; char *buffer2; //unprotected buffer: //allocate memory using mmap() buffer2 = (caddr_t) mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0,0); //[2] put some characters in the allocated memory //we're setting these characters one at a time in order //to avoid our strings being detected from the binary itself buffer2[0] = 'n'; buffer2[1] = 'o'; buffer2[2] = 't'; buffer2[3] = ' '; buffer2[4] = 'h'; buffer2[5] = 'e'; buffer2[6] = 'r'; buffer2[7] = 'e'; //protected buffer: //allocate memory with mmap() like before buffer = (caddr_t) mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, 0,0); //[2] put some characters in the allocated memory //we're setting these characters one at a time in order //to avoid our strings being detected from the binary itself buffer[0] = 'f'; buffer[1] = 'i'; buffer[2] = 'n'; buffer[3] = 'd'; buffer[4] = ' '; buffer[5] = 'm'; buffer[6] = 'e'; //[3] protect the page with PROT_NONE: mprotect(buffer, size, PROT_NONE); //[4] print PID and buffer addresses: printf("PID %d\n", pid); printf("buffer at %p\n", buffer); printf("buffer2 at %p\n", buffer2); //spin until killed so that we know it's in memory: while(1); return 0; }
Now we'll just compile the above program and run it. In short, the above program [1] creates two buffers, [2] places characters in these buffers, [3] then calls mprotect() on one of them with PROT_NONE; [4] the program then prints out its process ID and the virtual addresses of the aforementioned buffers:

$ ./victim PID 29620 buffer at 0x7f2bec9a1000 buffer2 at 0x7f2bec9a2000
Let's set up a config file so that we don't have to type as much:

$ cat linux.config [DEFAULT] LOCATION=file:///Path/to/Virtual%20Machine/Linux%20Mint%20Cinnamon/Mint%2064-bit-d583834d.vmem PROFILE=LinuxLinuxMintCinnamonx64x64 PID="29620"
If we try to search for the strings we placed in the buffers we get:

$ python vol.py --conf-file=linux.config linux_yarascan -Y "not here" Volatility Foundation Volatility Framework 2.4 Task: victim pid 29620 rule r1 addr 0x7f2bec9a2000 0x7f2bec9a2000 6e 6f 74 20 68 65 72 65 00 00 00 00 00 00 00 00 not.here........ 0x7f2bec9a2010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x7f2bec9a2020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [snip] $ python vol.py --conf-file=linux.config linux_yarascan -Y "find me" Volatility Foundation Volatility Framework 2.4

Well that was disappointing, we couldn't find the "find me" string. What you would expect, is to be able to access the memory contents using Volatility. Let's use the linux_volshell plugin to explore the victim process' memory and see if we can access the memory addresses (given to us from the program at run time) directly. First we'll examine the contents at 0x7f2bec9a2000:

$ python vol.py --conf-file=linux.config linux_volshell Volatility Foundation Volatility Framework 2.4 >>> Current context: process victim, pid=29620 DTB=0x3cfb1000 >>> db(0x7f2bec9a2000) 0x7f2bec9a2000 6e 6f 74 20 68 65 72 65 00 00 00 00 00 00 00 00 not.here........ 0x7f2bec9a2010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x7f2bec9a2020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x7f2bec9a2030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x7f2bec9a2040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x7f2bec9a2050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x7f2bec9a2060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x7f2bec9a2070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
As we see, we have the contents that we'd expect. Now to try to access the other memory location:

>>> db(0x7f2bec9a1000) Memory unreadable at 7f2bec9a1000
We see that we've been rejected. After a bit of investigation, we see that the point of failure is in the entry_present() function in the amd64.py address space. In order to find the value of the entry that failed, we'll print it out. First we'll add a print statement to entry_present():

def entry_present(self, entry): if entry: if (entry & 1): return True arch = self.profile.metadata.get('os', 'Unknown').lower() # The page is in transition and not a prototype. # Thus, we will treat it as present. if arch == "windows" and ((entry & (1 << 11)) and not (entry & (1 << 10))): return True # we want a valid entry that hasn't been found valid: print hex(entry) return False

Now let's see what the entry is:
$ python vol.py --conf-file=linux.config linux_yarascan -Y "find me" Volatility Foundation Volatility Framework 2.4 0x2681d160
So let's look a little closer at the page table entry:

>>> print "{0:b}".format(0x2681d160) 100110100000011101000101100000
The way we read this is from right to left.  The entry_present() function checks the 0th bit to see if the entry is present. In this case the bit is not set (0); therefore this function returns False.

At this point, I decided to take a look at the Linux source code to see how mprotect() is actually implemented. As it turns out, the _PAGE_PRESENT bit is cleared when mprotect(...PROT_NONE) is called on a page and the _PAGE_PROTNONE bit is set [3]. Looking at how _PAGE_PROTNONE is defined [4][5][6] we'll see that it's actually equivalent to the global bit (8th bit) [1][2]. So let's look at our page table entry again, we'll notice that the 8th bit is indeed 1:

>>> print "{0:b}".format(0x2681d160) 100110100000011101000101100000


So let's patch the entry_present() function with our findings:

def entry_present(self, entry): if entry: if (entry & 1): return True arch = self.profile.metadata.get('os', 'Unknown').lower() # The page is in transition and not a prototype. # Thus, we will treat it as present. if arch == "windows" and ((entry & (1 << 11)) and not (entry & (1 << 10))): return True # Linux pages that have had mprotect(...PROT_NONE) called on them # have the present bit cleared and global bit set if arch == "linux" and ((entry & (1 << 8))): return True return False


Let's see if we get anything back:
$ python vol.py --conf-file=linux.config linux_yarascan -Y "find me" Volatility Foundation Volatility Framework 2.4 Task: victim pid 29620 rule r1 addr 0x7f2bec9a1000 0x7f2bec9a1000 66 69 6e 64 20 6d 65 00 00 00 00 00 00 00 00 00 find.me......... 0x7f2bec9a1010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x7f2bec9a1020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [snip]

Success! So now we're able to see the "forbidden" string that we couldn't access before.  This goes to show that sometimes you have to dig a little into "why" something is not working as expected. In this case, we lucked out and had source code to examine, but sometimes things are not as easy as all that. Both intel.py and amd64.py have been patched in order to accommodate memory sections that have had mprotect() called on them with PROT_NONE.

References

[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming http://developer.amd.com/wordpress/media/2012/10/24593_APM_v21.pdf

Monday, March 16, 2015

Windows Malware and Memory Forensics Training in the UK

Windows Malware and Memory Forensics Training by The Volatility Project is the only memory forensics course officially designed, sponsored, and taught by the Volatility developers. One of the main reasons we made Volatility open-source is to encourage and facilitate a deeper understanding of how memory analysis works, where the evidence originates, and how to interpret the data collected by the framework's extensive set of plugins. Now you can learn about these benefits first hand from the developers of the most powerful, flexible, and innovative memory forensics tool.

Our last public class in the UK sold out, so we are back again! This training is organized and hosted by CCL Group, the UK's largest provider of digital forensics.

Details

Location: Stratford-Upon-Avon, United Kingdom
Cost: £ 2595 + VAT (lunch provided)
​Dates: Monday June 1st - Friday June 5th 2015
Times: 9 AM - 5 PM daily
Instructors: Michael Ligh, Jamie Levy, Andrew Case 
View this course's full page

Booking

For bookings, please contact training@cclgroupltd.com or call 01789 261200 (UK).

Other Courses

The following courses are also open for registration. 

Monday, February 2, 2015

Advice from Det. Michael Chaves on Memory Forensics, KnTDD, and POS Malware

The following story was shared by Detective Michael Chaves. It describes how he's used Volatility, KnTDD, and memory forensics over the past year to investigate POS breaches at local businesses. Kudos to Michael for applying his skills in an effective and meaningful way, then taking the time to share experiences with others. Without a doubt, detectives in every police department have or will encounter situations like Michael describes.
It's been about year since I've taken the Volatility Windows Malware and Memory Forensics Training in NYC.  I wanted to take this time to share some of my experiences to hopefully help examiners/investigators early on in their exposure to Volatility and to help identify unknown malware.  Over the past 10 months I have responded to about a dozen POS breaches at local businesses; mainly liquor stores and restaurants. These breaches are identified rather quickly from local banks that call me with the details and I usually respond to a location within 2 days. It should come as no surprise that I have yet to respond to a location that was anywhere near being PCI compliant.

The large majority of POS terminals were running Windows XP some with SP2, most with SP3.  Two machines were even running Windows 2K and all had direct connections to the Internet.  Antivirus IF present was either out of date or turned off.  I still have yet to see a firewall present or any security policy in place.  My RAM capture tool of choice is Kntdd and I’ll use FTK Imager Lite to obtain all registry files, App Data directory, $log, $MFT and prefetch directory.  I carry with me several portable drives to make the acquisition from each POS location in the shortest amount of time possible as the store still needs to process customer purchases. 

For most of these breaches I have been able to identify the malware pretty easily.  I usually begin by running, pslist, psscan, psxview and connections (if supported).  In the majority of the breaches, the processes were not hidden and had an active process listed, usually called by ‘explorer’.  If I was not able to easily identify the malware process, I’d run dlllist to locate any programs running from odd locations followed by malfind  and yarascan.  Once I have identified a suspect process, I’d dump that process usually by procdump or dlldump.

During the early part of the investigation, I am not too concerned how the malware works or what the Initial Infection Vector was.   I want to know where the credit cards are going and how are they getting out.  I’d run strings on the exported out suspected malware file and I would generally find, the URL used to send out the cards via POST, an e-mail address associated with the malware and/or IP addresses.  The majority of my cases the POST command was used to send out the cards, in others it was via SMTP.  In 5 of my 12 breaches, the malware family was JACKPOS or Alina variants.  I will search the Internet on file names, URL’s and artifacts that usually result with great write ups that show what I may be investigating, as well researching with Virustotal.   It should also be noted that there have been a few times that I sought assistance from the Volatility community and other students from the NYC class.  They have been extremely receptive to my questions and information provided to was invaluable!

I realize there are many readers out there that have a far greater understanding of malware and memory forensics.  I’m slowly getting there, but I hope this helps out the people just beginning, like I am/was by describing my workflow and perhaps give confidence to some that may otherwise doubt their ability.

Tuesday, January 27, 2015

Incorporating Disk Forensics with Memory Forensics - Bulk Extractor

In this post we will take our first look at a tool that is primarily used for disk forensics and show how it can be useful during memory forensics analysis as well. In the coming weeks we will have several follow on posts highlighting other tools and techniques.

Background

As you are likely aware, much of the information that is recoverable from disk and the network is also recoverable from memory - assuming you get a valid sample within a proper time frame.  If these conditions are met, then many forensics tools not usually associated with memory forensics can be incorporated into the memory analysis process. The artifacts recovered by these tools are often ones that memory tools do not focus on and are ill-suited to handle. Through incorporation of such tools and techniques, investigators can fully leverage all information available in volatile memory.

In many real-world investigation scenarios, you may only ever get a memory sample - see Jared's OMFW 2014 presentation for an excellent example of this [6] - and even less likely will you get full network flow (PCAP) during the time frame of an attack. This is unfortunate as the network often has information vital to an investigation, such as which servers were used to attack the network, which systems attackers laterally moved to, which domain(s) malware was downloaded from, and what commands were sent by a C&C server to local nodes. Even without a PCAP, all hope is not lost though, as network data must traverse main memory in order to be sent and received by applications.** This leaves the opportunity for historical network information to be left in memory long after it was active.

This idea was explored in great detail by Simpson Garfinkel and his co-authors in their 2011 DFRWS paper 'Forensic carving of network packets and associated data structures' [5]. In the paper they discuss not only historical network data in volatile memory, but also how that information can be stored on disk through system swapping and hibernation. This approach has the advantage of potentially getting network data from even previous reboots of the system.

To demonstrate the validity of their approach and evaluate the research, new modules were added to the open source bulk_extractor [1, 2, 3] tool in order to automate extraction and processing.

Since the publication of this paper and the popularization of bulk_extractor, extracting network data from memory captures has become a technique used by many analysts and it has even appeared in leading college digital forensics courses [4].

** With the exception of hardware rootkits within NIC firmware. If you believe this type of malware is active on a system that you need to investigate then you should check out the KntDD acquisition tool [14]. It can safely acquire memory from select NICs as well as acquire memory from other hardware devices.

Bulk Extractor

bulk_extractor is a highly-optimized open source tool that can scan inputs (disk, memory captures, etc.) and automatically find a wide range of information useful to investigators, such as email addresses, URLs, domains, credit cards numbers, and more. The project's wiki documents all of its scanning features [7]. Due to its multi-threaded, C++ design, bulk_extractor can often process inputs and extract all features at the speed at which the input can be read, making it very useful for efficient analysis.

When bulk_extractor is finished processing a target it then produces a number of output files. The exact files produced depends on which scanners were activated. For most scanners, in particular the network-related ones, not only is there a file of the raw data recovered, but there is also a histogram produced that orders entries based on the number of times found. This can be interesting in a number of ways, such as looking for the most common email addresses to determine who a user being investigated communicated with frequently.

Carving Network Packets and Streams from Memory

One of bulk_extractor's most useful features to memory forensics, and the one this blog will focus on, is the ability to profile network data in a sample as well as produce a PCAP of all packets found.

The profiled data includes recovered IP addresses, which can be mapped to known-bad lists or researched through threat intelligence tools, and Ethernet frames, which can be used to determine which local systems were contacted. Both of these profiled data sets include a histogram file.

To get this set of information you can run bulk_extractor as:

bulk_extractor -E net -o <output directory> <memory image path>

This will only run the net scanner and quickly produce the results described above in the directory specified with -o. Depending on your time constraints and desired data recovered, you can also run with other options, such as:

bulk_extractor -e net -e email -o <output directory> <memory image path>

This will run both the net and email scanners, which adds several more output files of interesting forensics data, including information about all domains found as well as email addresses and URLs. This can obviously give deep insight into what network activity occurred on the system being analyzed - potential phishing email accounts, malware URLs and domains, attacker C&C infrastructure, and so on.

Analyzing the PCAP File

In many of our own investigations we have found the PCAP file produced by bulk_extractor to be invaluable. In the connection traces we have uncovered evidence of data exfiltration, including (partial) file contents, commands sent by attackers to malware on client systems, input and output of tools on remotely controlled cmd.exe sessions, re-directed HTTP traffic from web servers, and much more.

The number of packets in a created PCAP file is limited to what is available in memory and not overwritten. For systems with large amounts of RAM, this can be many megabytes of data. For systems with less RAM, the in-memory caches will obviously be smaller, which often leads to less remnant information.  As we will show with the following forensics challenges though, even systems with small amounts of RAM can still retain crucial, historical network data long after it was processed.

Forensics Challenges

To show the usefulness of extracting network data from memory samples, we ran bulk_extractor against several memory captures included with popular forensics challenges. We choose these challenges for several reasons. First, they include both Linux and Windows samples. This shows the usefulness of the technique across several operating systems. Second, these challenges include small memory captures while still containing very useful data. Finally, by choosing public memory samples, anyone can perform their own research and follow the steps shown in the blog.

DFRWS 2005 Challenge

The first challenge we will analyze is the DFRWS 2005 challenge [8]. This is one the most well-known challenges since it directly inspired what we now consider modern memory forensics.

In this challenge, investigators were given a full network capture as well as a memory capture both before and after the suspect's laptop crashed. After running bulk_extractor, with -E net chosen, against the first sample we obtain a PCAP file (packets.pcap in BE's output folder) that has 130 packets. This is in comparison to 6304 packets in the full network capture, meaning roughly 2% of the related packets were recovered from a 126MB memory capture (no, the size is not a typo).

While this number may seem incredibly small, analysis shows that the recovered network data has highly useful information. As seen in the following screenshot from the PCAP loaded into Wireshark, proof of the "Back Orifice" (BO2k) malware being used on the system is present. This also includes partial file names and commands used on the system:


As discussed in the challenge's solutions, these backdoor interactions were a key part of solving what the attacker did to the system. Assuming a PCAP wasn't given along with the challenge, which we will see is not always the case, then volatile memory would be all investigators had to rely on for network clues.

DFRWS 2008 Challenge

The second challenge we will discuss is the one from DFRWS 2008 [9]. This included a Linux memory memory sample along with an accompanying PCAP file. As with the previous challenge, we will compare the network data obtained from memory to the provided PCAP along with analyzing the data from memory. This memory capture was 284MB and 122 packets were recovered. This is in comparison to 10243 contained in the full network capture (~1%).

Of these 122 packets, there was quite a bit of interesting data. As documented in the winning solution to the challenge [10], an encrypted ZIP file was exfiltrated from the network. Portions of this exfiltration appear in the capture. Also, the challenge details a Perl script used to extfiltrate the data using HTTP cookies on particular domains. Both fragments of the Perl script as well as several cookie instances are contained within the memory capture. The following shows one of the malicious requests encoding data within a msn.com cookie.


Note that the real msn.com is not contacted and a server under attacker control in Malaysia (219.93.175.67) is contacted instead. For more information on this Perl's scripts chunking and encoding of data, please read the winning submission.

Honeynet 2010 Challenge

Honeynet's 2010 Challenge [11] focused on a end-user infected with a banking trojan. In the challenge you were given only a memory sample of the victim system. The given memory sample was 512MB and bulk_extractor recovered 256 packets. From our analysis, we believe this is a fairly significant amount of packets that were sent or received during creation of the challenge, even though it is a small number.

As shown in the following Wireshark screenshot, one of the packets recovered includes the request that downloaded the ZBot malware:


This is the focal point of the investigation, and if you read the challenge's official solution [12], you will see that once the malicious domain and connection are determined, the rest of the investigation falls into place.  Through the use of bulk_extractor we can immediately find the request in-tact, whereas the solution required an intricate mixing of strings and Volatility to map the pieces together. It is also interesting to note that when trying to definitively piece together the network traffic the solution states:

"It’s not possible to state it for sure since no network dump is available."

As we have just learned, we do not need to be given a network capture in order to get verifiable and useful network data from memory.

Honeynet 2011 Challenge

The 2011 Honeynet Challenge was an investigation against a memory sample and disk image of a compromised server. No network capture was provided. The memory capture is 256MB and 347 packets were recovered. As discussed in the solutions' for this challenge, the initial foothold was gained on the server through a heap-based vulnerability in the Exim mail server. The following from the PCAP of recovered packets shows this exploit in action:


Besides the large buffer of AAAAA's used to fill the heap, you can also see the IP address (192.168.56.101) of the attacking system as well as its destination port (25) against the local server. This is an immediate warning sign of an attack against the mail server.

Mapping Connections to Processes

As useful information is recovered from memory, investigators often want to determine which process or kernel component was responsible for sending or receiving the particular packets. Volatility provides two ways to do this. The first is through the use of the strings plugin. The strings plugin can map a physical offset in a memory capture (the offset in the capture file) to its owning process or kernel module. Unfortunately, the strings plugins is not very straightforward with a PCAP file as the offset in physical memory of the packet is not encoded within the PCAP's data**.

To get around this limitation, you can use the yarascan plugin to search for unique data within the packet(s) that you find interesting. To do this, pass the data as the -Y option to yarascan. Note that you can pass arbitrary strings, bytes, and hex values to yarascan and you are not just limited to searching readable text. As yarascan searches for your signature it will list any processes or kernel components that contain the data. This can then immediately point you to the area of code responsible for generating or receiving the packet, which can greatly focus your analysis.

For those who closely follow Volatility, you know that there are also the ethscan [15] and pktscan [16] plugins that can perform the mapping as well. Unfortunately, pktscan is no longer supported and ethscan can take substantially longer to produce results versus bulk_extractor. For these reasons and others we choose to incorporate bulk_extractor in our analysis instead of using only Volatility components. 

** Someone please let me know if I am wrong about this. I have not seen this documented anywhere for bulk_extractor and I am not sure its possible using the old PCAP format that BE writes the capture in. The newer PCAP-ng format allows for arbitrary comments though and a nice addition to BE would be writing to the new format with a comment of the physical offset of where the particular packet was found.

Conclusion

This post was written to highlight the power of carving network data from memory. When full packet capture is not available, or when no network data is available, the ability to extract network data from memory may be your only chance of recovering it. As shown, bulk_extractor is a highly capable tool for this task. Not only can it recover packets to a PCAP, but it can also be used to recover a wide range of other information as well as arbitrary strings or regular expressions that you configure it for.

We strongly encourage our readers who are currently not utilizing this capability to start to incorporate in into your memory forensics workflows. We believe you will see immediate usefulness of the data recovered and bulk_extractor makes recovery trivial.

References

[1] https://github.com/simsong/bulk_extractor
[2] http://simson.net/ref/2012/2012-02-02%20USMA%20bulk_extractor.pdf
[3] http://simson.net/ref/2013/2013-12-05_tcpflow-and-BE-update.pdf
[4] https://samsclass.info/121/proj/p3-Bulk.htm
[5] http://simson.net/clips/academic/2011.DFRWS.ipcarving.pdf
[6] http://www.slideshare.net/jared703/vol-ir-jgss114
[7] https://github.com/simsong/bulk_extractor/wiki
[8] http://www.dfrws.org/2005/challenge/
[9] http://www.dfrws.org/2008/challenge/
[10] http://sandbox.dfrws.org/2008/Cohen_Collet_Walters/Digital_Forensics_Research_Workshop_2.pdf
[11] http://www.honeynet.org/challenges/2010_3_banking_troubles
[12] http://www.honeynet.org/files/Forensic_Challenge_3_-_Banking_Troubles_Solution.pdf
[13] http://www.honeynet.org/challenges/2011_7_compromised_server
[14] http://www.gmgsystemsinc.com/knttools/
[15] http://jamaaldev.blogspot.com/2013/07/ethscan-volatility-memory-forensics.html
[16]  https://code.google.com/p/volatility/issues/detail?id=233