Hi all,

I was somewhat surprised to find in LWN a message, posted to this list[1], suggesting that a project of mine, fakeroot-ng, is a potential beneficiary of utrace. Truth be told, all utrace has offered me so far is pain.

In particular, when working with ptrace to perform generic virtualization, one runs against an interesting problem. The core ptrace interface is notifying the debugger about events delivered at the debuggee. Whenever "interesting" events are reported (such as single step or a system call), this appears to the debugger to be a SIGTRAP delivered at the debugee process. Particularly for system calls tracing, the debugger needs to keep track over how many times it was notified, as it will get two notifications for each system call - one upon entry and one upon exit. I'm fairly sure that I'm not saying anything which is news to almost everyone on this list.

The problem is that, as a debugger, I need to be able to differentiate between a SIGTRAP supposedly delivered to the debuggee because I asked to trace the system calls, and a SIGTRAP actually delivered to the debuggee. If I don't, my count is going to be off, and I will totally mis-interpret the debugee's state.

The best way, as far as I can tell, to do that on Linux is to use the PTRACE_GETSIGINFO command. This provides me with a field, si_code, that can distinguish between a signal and a system call. This is important to make sure that I don't get confused over which is which.

Unfortunately, utrace (at least the version integrated into the Fedora Core 9 and Fedore 10 kernels) totally eliminated this system call. When calling ptrace with PTRACE_GETSIGINFO I get back "Invalid argument".

I've tried to figure out how other programs handle the situation. Looking at the strace sources, it seems to use a heuristics in order to try and detect this state. It relies on the fact that, on most Linux platforms, the kernel sets the return code register to -ENOSYS before calling the syscall enter ptrace hook, and tries to detect spurious SIGTRACE if the value is not set. This solution has numerous deficiencies:

   * It is platform specific. On PowerPC, for example, the kernel does
     not, and strace has no way of telling the two cases apart.
   * It is non-reliable. The check can only be made on the syscall
     enter hook, not the exit hook.
   * It relies on internal kernel behavior
   * It is easy to fool by a malicious programmer. For example, send
     the signal from another process, have the first process do a tight
     loop where EAX (or whatever) is set to -ENOSYS, and strace will
     think you have entered a random system call, probably the last one
     again. Do that right after a fork or an exec, and all sorts of fun
     stuff will happen.

Since I'm aiming to use the fakeroot-ng technology for security related stuff (not in fakeroot-ng - I intend to split the project), these drawbacks are fatal.

Don't get me wrong. I think cleaning up the debugger interfaces inside the kernel is an excellent idea. I just don't think breaking user space compatibility over the old interface, broken though you might think it is, is justified. This is directed not so much against the utrace project as it is against RedHat including it in production kernels.

Shachar

[1] - http://www.redhat.com/archives/utrace-devel/2009-March/msg00112.html

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.comhttp://www.redhat.com/archives/utrace-devel/2009-March/msg00112.html

Reply via email to