Friday, October 5, 2012

MoVP 4.4 Cache Rules Everything Around Me(mory)


Month of Volatility Plugins

After an exciting month of new Volatility plugins and another amazing OMFW, we are in the final home stretch. It's only fitting that we take a moment to fill in some gaps and dispel some myths and misconceptions. In particular, this blog post will focus on the Windows Cache Manager and discuss the dumpfiles plugin that will be released in Volatility 2.3.

Caching

An important concept that every computer scientist, especially those who have spent time doing operating system research, is intimately familiar with is that of caching. For those who weren't lucky enough to spend their college years stuck in a lab rewriting Xinu internals (Shoutz to Professor Comer!), a cache generally involves a smaller relatively faster storage component that is used to temporarily store a subset of data which is generally stored in a larger but slower storage component. By leveraging locality of reference, caching techniques attempt to store frequently accessed data or recently accessed data within the faster storage to reduce expensive operations.

Caching is an important performance principle often used throughout system design (networking, file system, operating system). As a result, it is also an important concept that every forensics analyst should be intimately familiar with, especially those who work in the area of memory analysis. The medium of analysis (RAM), in and of itself, acts as a cache for secondary storage. Researchers have also shown the importance of being able to analyze cached artifacts in memory (SELinux access vector cache, registry keys cached by the configuration manager) and how they can be manipulated to hide attack artifacts.  As an added advantage, caching also provides investigators triage hints as to what pieces of data or objects are temporally or spatially relevant at the time of the suspected incident. In the rest of this post, we will focus on finding artifacts associated with file data that are cached in memory. As I'm sure we all agree, most memory analysis techniques are just talk until they are implemented in Volatility!

File Artifacts

File artifacts often play an important role during investigations. When performing memory analysis, we generally focus on two major categories of files: executables and data files.  We may want to determine which files were being accessed, which processes were accessing a particular file, the provenance of a chunk of data found in memory, what data was in a particular file, and if that data had been surreptitiously modified. Not to mention, we may also want to extract that file so it can be analyzed by another third party tool. Historically, most of the Volatility plugins that can help address these questions have focused on enumerating file information (ie. handles, filescan, dlllist, modscan, modules, vadinfo) or extracting executables from a particular processes virtual memory (dlldump, moddump, procexedump, procmemdump, vaddump).

Occasionally, there are also people who still attempt to reconstruct a file from a memory sample using traditional carving. In most instances, they will run their linear carving tool against a memory sample. These tools will linearly scan the data looking for specific signatures associated with the file format. Unfortunately, most of these tools assume the file data will be contiguous, whereas the data stored in physical memory is inherently fragmented and only parts of the file may actually be loaded into memory. As a result, except for files that are smaller than a page of memory, the analyst is probably not going to extract the data they expect.

Those investigators with a stronger understanding of virtual memory management have also attempted to circumvent the fragmentation problem by scanning the virtual address space of a particular process. Thus, they would use the memdump plugin to dump the virtual address space, associated with a particular process, and scan it with their linear carving tool. While this clearly demonstrates a better understanding of virtual memory management and will work a lot better than scanning the physical address space, it can still suffer from the fact that only a subset of the data may actually be mapped into virtual memory at a particular time. It also requires that the investigators clearly understand what data is generated by the memdump plugin. The memdump plugin extracts all the memory resident pages (including both user and kernel) within a particular processes address space and dumps them to an individual file. I guess that concept can be confusing to some people. One group of researchers went so far as to try to compare the output of memdump with a tool that was extracting a memory resident executable. Then they couldn't understand why the size of a dump of addressable virtual memory was dramatically bigger than that process's executable.  They also don't seem to understand the value of having a deeper understanding of the differences in the so-called "global" kernel address space.  I guess the source code isn't enough for some people.

Alternatively, instead of following the traditional forensics trend of trying to pretend that structured data is unstructured, we can leverage the context provided by Volatility to do something more intelligent. In particular, we can use Volatility to analyze the file mapping data structures associated with the Windows Memory Manager and Cache Manager to reconstruct the files.

Windows Cache Manager

Within the Windows operating system, the cache manager is the subsystem that provides data caching support for file system drivers. The cache manager is responsible for helping make sure the frequently accessed data is found in physical memory to improve I/O. This is accomplished by leveraging the memory manager's support for memory-mapped files. The cache manager accesses data by mapping views of files, within the virtual address space, using the memory manager's support for memory mapped files. Thus, the memory manager controls which parts of the file data are actually memory resident. On the other hand, the cache manager caches data within virtual address control blocks, VACB. Each VACB corresponds to a 256-KB view of data that is mapped in the system cache address space.  In the remainder of the post, we will describe how the internal data structures associated with the memory manager and cache manager can be used to reconstruct file artifacts.

Data Structures

Luckily, all of the structures we would need to extract cached and memory mapped file artifacts are provided within Microsoft's supporting debugging information and, as a result, they can be readily found within Volatility VTypes. The main structure we are going to start with is the _FILE_OBJECT. These executive objects can be found with a number of Volatility plugins including filescan, handles, and vadinfo. The _FILE_OBJECT is a Windows kernel object used to track each instance of an open file. Once we find instances of _FILE_OBJECTs, we can use its SectionObjectPointer member to find the associated _SECTION_OBJECT_POINTERS. The memory manager and the cache manger use this structure to store file mapping and cache information for a particular file stream. Based on the members of  the _SECTION_OBJECT_POINTERS, we can determine if the file was mapped as data (DataSectionObject) and/or as an executable image object (ImageSectionObject), and if caching is being provided for this file.

Both the ImageSectionObject and DataSectionObject members of _SECTION_OBJECT_POINTERS are opaque pointers to control areas, _CONTROL_AREA. Once we have found the offset of the associated _CONTROL_AREA, we can find the subsection structures, _SUBSECTION, that are used by the memory manager to track regions of file streams that are mapped. The initial subsection structure is stored immediately after the _CONTROL_AREA in memory and subsequent subsections are found by traversing a singly linked list pointed to by the NextSubsection member. If the file was mapped as data, there will most likely only be one subsection.  On the other hand, if the file was mapped as an executable image, there there will be one subsection for each section of the portable executable (PE).   By leveraging the SubsectionBase member of _SUBSECTION, we can find a pointer to an array of _MMPTE's.  By traversing the array of _MMPTE's we can determine which pages are memory resident and where they are stored in physical memory. It is important to note that the size of _MMPTEs changes not only between hardware architectures but also when a PAE enabled kernel is being used. Using this information, we can reconstruct those files that may be memory mapped as either data or image section objects.


In the instances where caching is being provided, the SharedCacheMap member of the _SECTION_OBJECT_POINTERS structure is an opaque pointer to the SHARED_CACHE_MAP structure. The SHARED_CACHE_MAP structure is used by the Cache Manager to track the state of cached regions, including the previously described 256-KB VACBs.  The cache manger uses a VACB index arrays to store pointers to the VACBs. As a performance optimizations, the _SHARED_CACHE_MAP contains a VACB index array, InitialVacbs, of 4 pointers, that is used for files 1 MB or less in size. If the file is larger than 1 MB, the Vacbs member of _SHARED_CACHE_MAP is used to store a pointer to a dynamically allocated VACB index array.  If the file is larger than 32 MB, a sparse multilevel index array is created where each index array can hold up to 128 entries. Since we are trying to find all the cached regions that may be memory resident, we recursively walk the sparse multilevel array looking for file data.  The _VACB contains the virtual address of where the data is stored in system cache, BaseAddress,  and the offset where the data is found within the file, FileOffset.  Using this information, we can reconstruct the file based on the cached regions found in memory.


Prior Research

While some recent research made the assertion that the Windows Cache Manager had been largely ignored in memory forensics, this statement was a bit disingenuous.  In fact, there have been a number of research papers and projects that have demonstrated the importance of extracting files (memory mapped/cached) from physical memory.  The first to explore this issue was R.B. van Baar in 2008, when he published a paper, "Forensic memory analysis: Files mapped in memory", in which he showed that 25% of data in memory dumps was attributed to memory mapped files.  He also discussed a number of techniques for extracting file artifacts using both allocated file mappings and unallocated pages. While the paper did not detail the algorithms used in the analysis, we remembered, that Ruud had submitted the code for inclusion in Volatility back in 2008. Looking though the code, we noticed that he focused on a subset of PTEs found when traversing the DataSectionObject.

Later that year, a team lead by Seyed Mahmood Hejazi also wrote a paper titled, "Automated Windows Memory File Extraction for Cyber Forensics Investigations", which also discussed the benefits of extracting memory mapped and cached files. Unfortunately, some discrepancies in their pseudo code and images made it difficult to replicate their research.  Their analysis of cached files also seemed to be limited to files that were small enough to be found in the InitialVacbs (1 MB).  

In 2010, Carl Pulley created a Volatility plugin, exportfile, for extracting both memory mapped and cached file artifacts that he used to solve the Honeynet Project Challenge 3. In particular, he demonstrated how PDFs and Firefox artifacts could be extracted from physical memory. This was a major step forward as he was the first to actually release code that others could use and evaluate. Thus, we wanted to see if we could augment Carl's work to support PAE kernels and x64 (Shoutz to Carl!).

Finally in 2011, another research team re-implemented Carl's work for a Black Hat presentation titled, "Physical Memory Forensics for Files and Cache". The work presented lacked sufficient detail or code to evaluate the effectiveness of their approach. They also demonstrated a clear lack of understanding in a number of their claims.

The DumpFiles Plugin

The DumpFiles Plugin works by collecting _FILE_OBJECTS from both the handle table and virtual address descriptor tree. Once those objects have been collected, it will proceed to extract all memory mapped and cached regions to a specified output directory. Typical command line usage:

$ python vol.py dumpfiles -f <sample> --dump-dir <filesdir>  -S <summaryfile> --profile=<Profile> 

This will create a number of extracted file artifacts in the output directory.  These files are named with the following schema: (file.$PID.[SharedCacheMap.offset|ControlArea.offset].[img|dat|vacb]). The goal of the naming schema is to help provide provenance as to where the data originated. Example output names can be seen below:
file.1300.0x8704e540.img 
file.2648.0x868b58e0.vacb
file.436.0x8a06ace8.dat

It will also create a detailed summary file describing which pages where actually present and which were paged out and subsequently padded with the output file.  During experiments, we have seen the plugin extract anywhere from 308 MB on an unused Windows 7 system to 1.4 GB on a Win7x64 production system. Examples of files extracted from that sample include

file.1128.0xfa800dcd5450.img:  MS-DOS executable PE  for MS Windows (DLL) (GUI) Mono/.Net assembly
file.1248.0xfa800e3e6ab0.dat:  Microsoft Internet Explorer Cache File Version Ver 5.2
file.2312.0xfa800ce1b6a0.img:  MS-DOS executable PE  for MS Windows (DLL) (GUI) Intel 80386 32-bit
file.2580.0xfa800d00d4a0.dat:  Microsoft Office Document
file.2964.0xfa800d362e30.img:  MS-DOS executable PE  for MS Windows (DLL) (console) Mono/.Net assembly
file.2964.0xfa800dd25660.vacb: MSVC program database   
file.3004.0xfa800d0c9e10.vacb: MS-DOS executable PE  for MS Windows (DLL) (GUI) Mono/.Net assembly
file.3004.0xfa800d1cd3b0.dat:  HTML document text
file.3004.0xfa800ea33800.dat:  ASCII English text

Example Use/Cases:

One example use case that has been previously discussed involved using the information from the cache manager to extract registry files from memory (prior to Windows 7). The researchers inaccurately claimed that this provided the same information available through Volatility's registry support. Unfortunately, what they failed to realize is that Volatility is actually extracting the registry keys cached by the configuration manager. As a result, Volatility is able to see the actual keys and values being used by the operating system (including Volatile data). Simply extracting the cached versions of the on disk files would not allow them to detect the attacks Moyix described in his paper.  That being said, the ability to augment the data cached in the configuration manager with a cached version of the actual file provides the investigator a powerful analysis capability that we will demonstrate in an upcoming post. Yet another capability that is only possible within Volatility.

Another interesting use/case involves using the memory mapped and cached version of files to look for modifications that may have been made to executable images by malware. For example, if malware attempts to make a inline control flow change to the text section of a memory resident PE, they will often get a private version of the page mapped into their address space (copy-on-write). By comparing different view of the data from different address spaces, we can easily identify anomalies in memory resident executable images.  During the OMFW presentation, we demonstrated how this analysis technique could be combined with the output of apihooks to provide extra context.  

Conclusion

The ability to extract cached and memory-mapped files provides another powerful capability to Volatility users.

Shoutz

Shoutz to FuzzyNop for reminding us that just looking at the code doesn't mean that people actually understand it and inspiring us to finally get the dumpfiles plugin integrated into core. Shoutz to Carl for creating a Volatility plugin and thus making it real!

Related References

Russinovich, M., Solomon, D.A. & Ionescu, A., "Windows Internals: Part 2" 6th ed. Microsoft Press, 2012.
van Baar, R. B., Alink,W., & van Ballegooij,  A. R., "Forensic Memory Analysis: Files Mapped in Memory." Journal of Digital lnvestigation (2008)
Hejazi, S.M., Debbabi, M., & Talhi, C., "Automated Windows Memory File Extraction for Cyber Forensics Investigation." Journal of Digital Forensic Practice (2008)
https://github.com/carlpulley/volatility/blob/master/exportfile.py
http://www.osronline.com/article.cfm?article=280
https://www.honeynet.org/challenges/2010_3_banking_troubles
http://wampir.mroczna-zaloga.org/archives/834-challenge-3-of-the-forensic-challenge-2010-banking-troubles.html
http://wampir.mroczna-zaloga.org/archives/859-rozwiazanie-wyzwania-forensic-challenge-2010-banking-troubles.html

No comments:

Post a Comment