Currently I am working on two projects, intropy and ICE, that involve the popular disassembler IDA Pro. With the deadline for the annual IDA Pro plugin contest just around the corner I thought this was a great time to dig deep into IDA and see what I can do. This post serves as a summary of my experiences and reactions using IDA Pro as well as some of the other technologies I tried to integrate over the past week or so.
The Goal: RECoop
Despite my best efforts I am not a reverse engineer, I continue to be fascinated trying to pull software apart to understand how it works but at this point I’m far from what I’d call a reverse engineer. That being said when I began trying to think of ideas for a plugin that I could conceivably write in a week and still be useful I looked to other aspects of computing. After tossing out a few other ideas I began to think more about how I’d like to become more proficient in the art of reverse engineering but at times find the lack of guidance to be somewhat discouraging. I then began to think about how cool it would be if I could sit down at my computer, open up IDA Pro, and have a mentor help guide me through some exercises. With that the idea for RECoop was born.
The main idea behind RECoop (Cooperative Reverse Engineering) is that currently the task of reverse engineering is performed largely in solitude and that I believe a single person would go insane trying to reverse some of the larger pieces of software out there. Therefore what we need a way for reverse engineers to work on the same project — concurrently and in real-time. I planned to achieve this by writing a plugin for IDA Pro that brought the communication facilities provided by Skype together with the massively scalable data storage systems provided by the Amazon Web Services. With RECoop I envision a team of reverse engineers all working together to understand a single piece of software and hope that as a result there will be both a savings in time as well as a better understanding of the software in question. Lastly, two environments that I think would benefit from RECoop immensely are in the (virtual) classroom and globally distributed teams of engineers.
IDA Pro And The SDK
Rather than immediately trying to reverse engineer the ubiquitous calculator application like I’m sure most new comers to IDA Pro do, I dove straight into the SDK and explored IDA Pro through the eyes of someone trying to bring new functionality to the disassembler.
Hex Rays has made two interfaces to the IDA Pro SDK available to developers. The primary interface is written in C++ and allows developers to integrate deeply with IDA Pro — creating plugins, processor modules, file loaders, and much more. The secondary interface is provided by a (now standard) plugin called IDAPython which, not surprisingly, provides a Python interface to the SDK. Both the C++ and Python interface are largely the same unfortunately I have yet to dive deep into the Python interface so I am not certain what the limitations are.
Upon downloading and unpacking the SDK I first took a look at some of the header files that are available. The first thing that stood out to me was that the header files are extremely well documented. All functions that Hex Rays intends to be used are marked with the ida_export qualifier and have a description of the function along with its inputs and outputs. Although have been useful to have a nice web-interface to search the API it is quite easy to grep through the header files and discover what is available to use. Another observation that I made was that the API is quite modular and the division of functionality into header files seemed quite logical. After rooting around the headers a bit I tried to build the SDK.
According to the documentation, building the SDK should be a simple matter of defining a couple environment variables and then running the make utility. Unfortunately it was not quite that simple and required me to make a few modifications to the Makefiles. Note that I was building the SDK on Mac OS X 10.6 and from the Makefiles included it appears that Hex Rays has been testing against Mac OS X 10.5 so most of the changes were just to update some of the paths and include files to reflect OS X 10.6. Once these changes were made the SDK built with no issues.
To develop RECoop I needed two things from the SDK: the ability to create interface components in IDA Pro, and the ability to integrate with the IDA Pro database used to store information about the current binary being analyzed. As of IDA Pro 6.0 Hex Rays transitioned the IDA Pro interface to the platform independent application framework Qt. Rather than exposing the entire Qt framework Hex Rays has opted to provide a small subset of GUI functions that they feel will suit the vast majority of plugins. With this subset it is possible to create some basic dialogs, create and manipulate control-flow graphs, create forms (IDA’s version of sub-windows), annotate existing disassembly views, and a few other things. While I agree this is likely enough support for most, it was not enough for RECoop and I had to build and integrate the full Qt framework. Hex Rays made (in my opinion) a number of smart choices in designing the API, one of which is the notifications to alert plugins of events in various components of IDA Pro such as the UI and database. Not only did they adopt a notification-based system, they have allowed developers to hook into these notifications and perform actions when they occur. This feature alone is largely what makes a plugin like RECoop possible.
Plugins in IDA Pro are compiled as dynamic libraries presumably to leverage some of the code loading facilities provided by the underlying operating system. In addition each plugin must export a special structure that tells IDA Pro what the initialization, termination, and run functions of the plugin are along with various pieces of metadata such as plugin descriptions and preferred keyboard shortcuts.
Perhaps surprisingly (I was surprised), IDA Pro is a single threaded application. I’m not entirely certain why that is the case however I speculate that it would simplify the operations associated with analyzing the binary and retrieving/storing data in the database. Regardless, this means that all plugins must run on the main thread and can cause the UI to lock up. I suppose for the vast majority of plugins this isn’t a major issue, but for RECoop it quickly presented a number of challenges. In addition, any operation that manipulates the UI drawing context must be called on the main thread. This too raised some issues in the development of RECoop.
Skype
At the beginning of this project I was actually completely uncertain as to whether or not integrating with Skype is even possible. It turns out that the fine folks at Skype thought of people like me and have made a product called SkypeKit available. SkypeKit enables developers to integrate Skype into their products and provide the same powerful real-time communications offered by the official Skype application. To make this work the folks at Skype took the approach of splitting the platform in to two parts: a runtime and the SDK. The SkypeKit runtime is essentially a headless Skype client. Through the SDK you can interact with the runtime and perform actions such as placing and receiving calls (video and audio), manage your Skype contacts, dial out of the Skype network to the phone network, and much more. The SkypeKit runtime is run as a daemon and as a result the SDK interacts with it through the IPC facilities provided by the OS. Due in part to this separation, SkypeKit takes a unique approach to communication based on asynchronous actions and the use of notifications.
During the login process a developer using SkypeKit must authenticate with Skype and create an Account object. To do this you call a series of functions on the main Skype object to acquire the account information and then call a login function. Once this has been done the created Account object is then notified once it has been logged in. The process is fairly straightforward and outlined in a number of SkypeKit tutorials; however, when integrating this process with IDA Pro it was found that the main IDA thread would block waiting for a response from Skype and in turn the response it was expecting would never be received because of the blocking. This problem was easily rectified by introducing a second thread to handle communication with Skype. Unfortunately this solution has two disadvantages: all UI calls must be somehow moved to the main thread, and it is unclear whether or not the introduction of additional threads has any adverse effects on IDA Pro. In addition to this I also ran into some minor issues with how the Skype SDK handles subclassing of the provided objects.
Qt
As with most things in this project I have never used Qt in the past and was pleasantly surprised once I began to use it. As previously noted IDA Pro 6.0 and up are all based on a platform independent interface developed using Qt. In order to provide some of the functionality needed for the plugin it is necessary to go above and beyond the UI support provided by the IDA Pro SDK and this can be done by leveraging the full Qt framework. The Qt framework itself is extremely well documented and provides facilities for UI elements and widgets, thread and process management, and much more. In addition to a large amount of functionality the Qt developers also introduce one technique, signals and slots, for communication that I found rather intriguing.
The idea behind signals and slots is that the relationship between UI elements is both hierarchical in terms of widget parent-child relationships but also functional in that text from a text input widget may be used in multiple other widgets. To support this functional relationship Qt introduced the idea signals, which are emitted as certain events occur, and slots, which can be executed when the containing object receives a specified signal. This functionality is outside the scope of standard C++ constructs and therefore requires the use of the Qt meta-object compiler during the build phase to generate the necessary code. In the context of RECoop signals and slots enable the cross-thread communication required to keep all drawing operations on the main thread. Lastly note that it is possible (and quite useful) to add custom signals and slots to non-Qt objects. For example, when a user types a chat message in RECoop a signal is emitted that causes Skype to send the message over the network.
Amazon Web Services
To provide the database backend required by RECoop I decided to go with Amazon SimpleDB and a server implemented in Erlang running in an Amazon EC2 instance. The decision to go with Amazon Web Services was guided by a number of factors. First, this meant I didn’t have to provide my own servers for the project. Although I could have hosted the service on this site I wanted to decouple the service from my own site in order to encourage private installs and alleviate me from managing more stuff. Second, Amazon SimpleDB boasts the ability to provide highly scalable database solutions that are capable of handling large numbers of concurrent connections. Since the idea is to have many people using the same database at once in RECoop this seemed like a natural fit. Finally, I had never used the Amazon Web Services so this seemed a good opportunity to check them out.
Erlang was selected for implementing the RECoop server because it provides exceptional facilities for handling concurrent connections, is extremely expressive and compact, and has facilities for quickly packing and unpacking binary data. Since this component of RECoop is still largely unimplemented I will save a discussion of it for a future post.
Wrap Up
For most the first week with IDA Pro is probably filled with analyzing countless lines of assembly code, staring blankly at instruction set references, and banging your head against the wall trying to wrap it around the low-level aspects of computers. For me it was an exploration of multiple SDKs; insights into API design, programming language constructs, and proper documentation; and a chance to explore several different technologies with the goal of integrating them to try and foster collaboration among reverse engineers.