Debugging with strace
strace is a tool that helps to inspect system calls your application makes. I found it really useful this week when I debugging a complex python application. I am not going through my original problem though, but will show you some typical examples as proof of concept.
Let’s start with a basic scenario, file I/O on local filesystem. Suppose we have a powerful python script traceme.py, whilst there isn’t a a.txt.
If we run it, not surprisingly you are getting
Traceback (most recent call last):
But let’s say for some reason, sometimes we don’t get the name of the missing file. That was what happened to me earlier this week, and I got so confusing just because I didn’t know what was missing out there.
I then ended up using strace, trying to find out what system call my program made. The simplies use case of strace is barely prepending command
strace to the command you want to debug.
strace python traceme.py
A bunch of blazing long message will be shown like following.
Those are all system calls your program made at runtime. Apparently, according the the following line, we can address the name of the missing file.
open("a.txt", O_RDONLY) = -1 ENOENT (No such file or directory)
It also told you that this file was supposed to be opened in readonly mode. You will find many system calls interesting if you are not very familiar with kernel level programming.
You can make your life way more easier buy just tracing a specific kind of system calls. In our case, my point of interest should be
-e trace=<comma-separated-list-of-system-call-categories> to designate that.
$ strace -e trace=open python traceme.py
Then hopefully we are getting a much simpler and cleaner output.
open("/usr/lib/python2.7/encodings/ascii.py", O_RDONLY) = 3
You can also attach a strace session to a running process by setting
-p parameter to the pid of process you are interested in. That will be super helpful when debugging your running web application.
You can even do profiling to all the system calls your code made by using
-c parameter. I personally think
c stands for collecting. It’s a really powerful tool for you to understand performance issues from a low-level ground.
% time seconds usecs/call calls errors syscall
You can tell from the result that
open system call is obviously is the performance bottleneck of the powerful app we just wrote.
I use strace whenever I don’t have enough context of what my app is doing with the operating system. You can even use it to debug internet connection by tracing
poll,select,connect,recvfrom,sendto system calls, which is super handy :P.