Dumping process memory on Mac OS X

A little while ago I was trying to setup Pai Mei (http://code.google.com/p/paimei/) on Mac OS X and wasn’t having much luck.  This didn’t really surprise me since it wasn’t designed for OS X, but it was worth a shot!  Aside from having issues setting it up I found the requirement to run a MySQL server kind of ridiculous when solutions such as SQLite exist.  Given all of this and my desire to try and peer into parts of software I’m probably not supposed to see, I decided to try my hand at writing a program that can track the execution of other processes (more on this in a future post).  The first step to doing any of this is being able to monitor and manipulate a process and I thought a fun way of doing that would be to try and dump the memory from one process to a file.

Mach Tasks

XNU, the kernel used in Mac OS  X, is built up of three main components: Mach, BSD, and I/O Kit.  Each component provides different services to the system however for this post we are most interested in Mach.  Mach was originally developed at Carnegie Mellon University as a simple and extendable microkernel and provides XNU several key abstractions including tasks, threads, address spaces, and ports.

A Mach task is an abstraction that contains one or more threads, a virtual address space, and ports among other things.  Mach tasks in turn serve as the basis of the familiar process model which is implemented in the BSD component of XNU.  As a result, Mach tasks are fundamental units in the XNU kernel and are central to dumping memory of a process.

Getting the Mach task: task_for_pid()

Since Mac OS X implements the POSIX API, provided by the BSD component of XNU, our first step in our quest to dump memory is to translate a process ID into a Mach task identifier.  Our job is made somewhat easy by the task_for_pid() function in mach/mach_traps.h as seen in the following block of code.

  kern_return_t ret = -1;
  mach_port_name_t tport = -1;
  int tpid = -1;

  tpid = fork_calculator();

  ret = task_for_pid(mach_task_self(), tpid, &tport);
  if(ret != KERN_SUCCESS) {
    fprintf(stderr, "main: failed to get task for pid %d (%d)\n", tpid, ret);
    return -1;
  }

The function fork_calculator() spawns a new process running the Mac OS X Calculator application and returns the process ID of the new process.  The call to task_for_pid() takes three arguments: an identifier for the current task, the target process ID, and a pointer to a location to store a port for communicating with the target process.  While this code looks simple there are a couple caveats that you must be aware of.  First, the introduction of Kernel Authorization in Mac OS X 10.4 requires the process requesting the task identifier to be properly authenticated.  You can achieve this by either implementing the required authentication in code (not shown here) or running the dumping tool as root.  Secondly, the process being targeted must be a child process of the dumping tool.  This is why our code forks a target process rather than simply accepting a process ID as an argument.  For a more in depth explanation of the required permissions see the comments for task_for_pid() in the XNU source code.

Reading memory: mach_vm_read()

Now that we have a port for the target process we can proceed to step two in our quest to dump memory: reading memory from the target process.  This is accomplished using the function mach_vm_read() which will copy the specified region of memory from the target process into the calling process.  The following block of code shows how to use mach_vm_read().

  kern_return_t ret = 0;
  pointer_t data;
  mach_msg_type_number_t data_size;

  long dump_base_addr = 0x10000190c;
  long dump_size = 0x1000;

  ret = mach_vm_read(tport, dump_base_addr, dump_size, &data, &data_size);
  if(ret != KERN_SUCCESS) {
    fprintf(stderr, "mem_dump: failed to read data at 0x%lx (0x%x)\n", dump_base_addr, ret);
    return -1;
  }

The call to mach_vm_read() takes four arguments: a port identifying the target process, the base address, the amount of data to read, a pointer that will contain the address of the data in our address space, and a pointer to the size of data that was actually read.  One caveat to using mach_vm_read() is that the memory at the specified address must be readable.  As seen from the above code, the only real difficulty in using this function is figuring out what the base address is.

Finding The Base Address

The technique you use to find the base address will depend entirely on what you’re trying to accomplish.  For the sake of an example I’ll show you how I came up with the address 0x10000190c. To find this address I used the incredibly useful tool otool to disassemble the Calculator application.

dean@BigBertha:/Applications/Calculator.app/Contents/MacOS $ otool -t Calculator
Calculator:
(__TEXT,__text) section
000000010000190c 6a 00 48 89 e5 48 83 e4 f0 48 8b 7d 08 48 8d 75
000000010000191c 10 89 fa 83 c2 01 c1 e2 03 48 01 f2 48 89 d1 eb

The output of otool (in this invocation, it can do many other things) simply lists the bytes at the location specified in the first column of output.

Putting It All Together: Dumping Memory

Putting the usage of task_for_pid() and mach_vm_read() together it is possible to dump memory of a target process.  Full code for this program is available on GitHub.  To use the program do the following:

dean@BigBertha:~/scm/Kitchen-Sink $ gcc -Wall -o mac_dump mac_dump.c
dean@BigBertha:~/scm/Kitchen-Sink $ sudo ./mac_dump

This will create a file in the current directory called mem.dump which contains the memory dump.  Looking at the dump you’ll see that we’ve successfully read from the Calculator address space and saved the memory to disk.  You can see this using hexdump as follows.

dean@BigBertha:~/scm/Kitchen-Sink $ sudo hexdump mem.dump
0000000 6a 00 48 89 e5 48 83 e4 f0 48 8b 7d 08 48 8d 75
0000010 10 89 fa 83 c2 01 c1 e2 03 48 01 f2 48 89 d1 eb

If you were to translate this dump you’d find see that it contains the instructions seen earlier when we used otool.

Comments are closed.