In order to figure out why, I created a few files with Hebrew names (which I can read and write, so I can verify if it is correct) and Arabic names (which was just me tapping on the keyboard, so they don't say anything). After creating them, I interacted with them to make sure they'd show up in filescan. Below you can see the filescan results:
In order to understand how the filename is output, you can look at the code in filescan:[snip] $ python vol.py -f Win7x86.vmem --profile=Win7SP1x86 filescan 0x000000003d7008d0 16 0 RW-rw- \Device\HarddiskVolume2\Users\user\Desktop\????.txt 0x000000003ddfef20 18 1 RW-r-- \Device\HarddiskVolume2\Windows\Tasks\SCHEDLGU.TXT 0x000000003def9340 16 0 RW-r-- \Device\HarddiskVolume2\Users\user\Desktop\????????????????????.txt [snip]
The function file_name_with_device() is defined in volatility/plugins/overlays/windows/windows.py and the relevant part is highlighted in red:for file in data: header = file.get_object_header() self.table_row(outfd, file.obj_offset, header.PointerCount, header.HandleCount, file.access_string(), str(file.file_name_with_device() or ''))
So we can take a look at this in volshell:class _FILE_OBJECT(obj.CType, ExecutiveObjectMixin): """Class for file objects""" def file_name_with_device(self): """Return the name of the file, prefixed with the name of the device object to which the file belongs""" name = "" if self.DeviceObject: object_hdr = obj.Object("_OBJECT_HEADER", self.DeviceObject - self.obj_vm.profile.get_obj_offset("_OBJECT_HEADER", "Body"), self.obj_native_vm) if object_hdr: name = "\\Device\\{0}".format(str(object_hdr.NameInfo.Name or '')) if self.FileName: name += str(self.FileName) return name
On line 3 we create a _FILE_OBJECT object. We know the offset where this object resides (0x000000003d7008d0) from filescan. Since this object was obtained from the physical address space, we specify this by setting vm to addrspace().base, or the physical layer, since this is a raw memory sample. Since the _FILE_OBJECT's native address space is virtual, we specify this as well: native_vm = addrspace(). At this point we have instantiated a _FILE_OBJECT in a variable called "file". We then print out the FileName member on line 5 and see its output on line 6. We follow the same process for the second file, except we save the object as "file2". As you can see, the output is not very helpful. So now we need to know what type of member, FileName is. In order to accomplish this, we need to look at the vtypes in the volatility/plugins/overlays/windows/win7_sp1_x86_vtypes.py file:1 $ python vol.py -f Win7x86.vmem --profile=Win7SP1x86 volshell 2 [snip] 3 >>> file = obj.Object("_FILE_OBJECT", offset = 0x000000003d7008d0, vm = addrspace().base, native_vm = addrspace()) 4 >>> print file.FileName 5 \Users\user\Desktop\????.txt 6 7 >>> file2 = obj.Object("_FILE_OBJECT", offset = 0x000000003def9340, vm = addrspace().base, native_vm = addrspace()) 8 >>> print file2.FileName 9 \Users\user\Desktop\????????????????????.txt
We have some functionality added to this type in volatility/plugins/overlays/windows/windows.py:'_FILE_OBJECT' : [ 0x80, { 'Type' : [ 0x0, ['short']], [snip] 'FileName' : [ 0x30, ['_UNICODE_STRING']], [snip]
We know that the file_name_with_device() function uses str() in order to transform the _UNICODE_STRING into something readable and if we look at the overridden __str__() operator in the above code, we see that it uses the dereference() function. The dereference() function never casts the data as unicode, however, so the data is printed incorrectly. If we look at the above v() function, we see that there is a call to dereference() and that the resulting data is case as unicode, so let's see if we get valid data back by calling that function instead:class _UNICODE_STRING(obj.CType): [snip] def v(self): """ If the claimed length of the string is acceptable, return a unicode string. Otherwise, return a NoneObject. """ data = self.dereference() if data: return unicode(data) return data def dereference(self): length = self.Length.v() if length > 0 and length <= 1024: data = self.Buffer.dereference_as('String', encoding = 'utf16', length = length) return data else: return obj.NoneObject("Buffer length {0} for _UNICODE_STRING not within bounds".format(length)) [snip] def __format__(self, formatspec): return format(self.v(), formatspec) def __str__(self): return str(self.dereference()) [snip]
Success! So let's modify the __str__() operator to use v() instead and see if that fixes filescan:>>> print file.FileName.v() \Users\user\Desktop\שלום.txt >>> print file2.FileName.v() \Users\user\Desktop\تهحححتهحححتهحححتهححح.txt
Now let's examine the filescan data:[snip] def __str__(self): return str(self.v()) [snip]
Success! Just to make dually sure, I then created a user with a Hebrew name: גלידה, and created some files with Hebrew characters as well. If we look back to the file_name_with_device() function, you'll see that the complete file path is populated by using the NameInfo optional header (name = "\\Device\\{0}".format(str(object_hdr.NameInfo.Name or ''))). If you look in the volatility/plugins/overlays/windows/win7.py file, you'll see the following definition:0x000000003d7008d0 16 0 RW-rw- \Device\HarddiskVolume2\Users\user\Desktop\שלום.txt 0x000000003ddfef20 18 1 RW-r-- \Device\HarddiskVolume2\Windows\Tasks\SCHEDLGU.TXT 0x000000003def9340 16 0 RW-r-- \Device\HarddiskVolume2\Users\user\Desktop\تهحححتهحححتهحححتهححح.txt
class _OBJECT_HEADER(windows._OBJECT_HEADER): [snip] optional_header_mask = (('CreatorInfo', '_OBJECT_HEADER_CREATOR_INFO', 0x01), ('NameInfo', '_OBJECT_HEADER_NAME_INFO', 0x02), ('HandleInfo', '_OBJECT_HEADER_HANDLE_INFO', 0x04), ('QuotaInfo', '_OBJECT_HEADER_QUOTA_INFO', 0x08), ('ProcessInfo', '_OBJECT_HEADER_PROCESS_INFO', 0x10))Now we know the object to find in the types file in order to figure out what type Name is. Look in volatility/plugins/overlays/windows/win7_sp1_x86_vtypes.py:Since Name is of the same type (_UNICODE_STRING), we should be covered. In the words of Al Bundy, "Let's rock":'_OBJECT_HEADER_NAME_INFO' : [ 0x10, { 'Directory' : [ 0x0, ['pointer', ['_OBJECT_DIRECTORY']]], 'Name' : [ 0x4, ['_UNICODE_STRING']], 'ReferenceCount' : [ 0xc, ['long']], } ],Success! A lot of other objects use _UNICODE_STRINGs, including mutants, registry paths and symbolic links. So this was an important fix.$ python vol.py -f Win7x86.vmem --profile=Win7SP1x86 filescan [snip] 0x000000003e8fb1c0 2 0 RW-rw- \Device\HarddiskVolume1\Users\גלידה\Desktop\עצם.txt 0x000000003f83c038 2 0 RW-rw- \Device\HarddiskVolume1\Users\גלידה\AppData\Roaming\Microsoft\Windows\Recent\עצם.lnk [snip]
Changes have already been reflected in the master branch of Volatility. We hope that you have enjoyed this not so short, quickie ;-)