intropy: Program Introspection Using Python

About two months ago I bought two books on Amazon, “The Mac Hackers Handbook” and “Gray Hat Python”, with the goal of being able to further explore my curiosity in reverse engineering using Mac OS X as a case study.  The books are interesting in that Gray Hat Python essentially makes the argument that Python is a great language for reverse engineering and The Mac Hackers Handbook shows you some of the basic tools and techniques needed for vulnerability detection, exploit development, and reliable deployment.  One interesting commonality between the two books is that they both leverage the PaiMei (http://code.google.com/p/paimei/) framework by Pedram Amini, a reverse engineering framework written in Python.

While both books claim the framework is extremely powerful I was largely unsuccessful in getting it up and running (on a Mac) because it was originally developed for Windows.  I also found some of the prerequisites somewhat ridiculous (case in point: it wanted MySQL which boggled my mind when much lighter alternatives are available).  So as you’d expect from me, I saw this as a challenge: develop a cross-platform reverse engineering framework that doesn’t require the kitchen sink to run.  My answer to that challenge is intropy — a Python framework for program introspection.

An Overview Of intropy

intropy Schematic

Figure 1: Schematic of the intropy framework.  Arrows represent the direction of information flow.

The current (partial) implementation of intropy is entirely in Python and C.  Python is used as the primary language with C only being used to build the abstraction layer between the operating system and the rest of the framework.  As shown in Figure 1 intropy consists of three main components: libicc, idb, and idc.

libicc

One of the primary goals of intropy is to be platform agnostic.  In order to accomplish this I have decided to develop an underlying library that abstracts the OS into a generic API.  The library, libicc, is written purely in C and is brought into the Python environment using the Python ctypes module.  Currently libicc has only been implemented for Mac OS X using a mix of the POSIX and Mach API’s.  As a result, the interface provided by libicc has a very Mach-ish feel to it; however, I don’t see that as being an issue because the API still follows common OS abstractions such as process, threads, and virtual memory.

idb

The sole user of libicc is idb, the intelligent debugger.   Written entirely in Python, idb enables reverse engineers to create their own modules that leverage basic debugging constructs such as stepping, creating breakpoints, and inspecting portions of memory.  The “intelligent” aspect of the debugger comes from the fact that, unlike traditional debuggers, it is possible to write Python scripts that can leverage the facilities provided by idb.  For example, the console shown in Figure 1 is written entirely as a module on top of idb.  Rather than bloating this post I will be writing another post solely about the design of idb and show some examples of writing modules with it.

idc

While the facilities provided by idb enable you to script the debugger, they do not allow you to inspect the binary at the basic block level.  For this I am planning on leveraging the power of IDA Pro and providing an interface in intropy to do more fine-grained analysis.  This component, idc, is still in the design phase and therefore I’ll leave a more complete discussion for a future post.

intropy Modules

Modules in intropy are written in Python allowing you to leverage the ease of development, functionality, and reliability of the Python language as well as the huge number of Python modules already written.  One interesting point about intropy is that everything written using idb and idc, including the console, is a module.  By doing this we enforce good software design, give everyone access to the same API, and allow for a greater level of flexibility.  On that note, another goal of intropy is to provide the foundation reverse engineers need to create a system that works for them.  Once the source code for intropy is available I will make the API available.

Next Steps

There is lots to be done before intropy is ready for prime time.  As mentioned earlier, currently there is only a partial implementation so the obvious first step is to complete the coding.  In addition, the current console implementation is lacking greatly in its ability to manage modules so I will also be working on a console design that enables users to load/unload, configure, and execute analysis modules.  Moreover, there is still a great deal of work to be done on design and implementation of idc.  Finally, once enough of the system has been created I will be posting the code to my site on Github.

A Final Note

Needless to say, I am really excited about the prospects for intropy.  Through intropy I hope to inspire other developers to look at their software through the eyes of both an engineer and a reverse engineer and in doing so make some impact (however small it may be) in the software landscape.

Comments are closed.