Hi all,
I was somewhat surprised to find in LWN a message, posted to this
list[1], suggesting that a project of mine, fakeroot-ng, is a potential
beneficiary of utrace. Truth be told, all utrace has offered me so far
is pain.
In particular, when working with ptrace to perform generic
virtualization, one runs against an interesting problem. The core ptrace
interface is notifying the debugger about events delivered at the
debuggee. Whenever "interesting" events are reported (such as single
step or a system call), this appears to the debugger to be a SIGTRAP
delivered at the debugee process. Particularly for system calls tracing,
the debugger needs to keep track over how many times it was notified, as
it will get two notifications for each system call - one upon entry and
one upon exit. I'm fairly sure that I'm not saying anything which is
news to almost everyone on this list.
The problem is that, as a debugger, I need to be able to differentiate
between a SIGTRAP supposedly delivered to the debuggee because I asked
to trace the system calls, and a SIGTRAP actually delivered to the
debuggee. If I don't, my count is going to be off, and I will totally
mis-interpret the debugee's state.
The best way, as far as I can tell, to do that on Linux is to use the
PTRACE_GETSIGINFO command. This provides me with a field, si_code, that
can distinguish between a signal and a system call. This is important to
make sure that I don't get confused over which is which.
Unfortunately, utrace (at least the version integrated into the Fedora
Core 9 and Fedore 10 kernels) totally eliminated this system call. When
calling ptrace with PTRACE_GETSIGINFO I get back "Invalid argument".
I've tried to figure out how other programs handle the situation.
Looking at the strace sources, it seems to use a heuristics in order to
try and detect this state. It relies on the fact that, on most Linux
platforms, the kernel sets the return code register to -ENOSYS before
calling the syscall enter ptrace hook, and tries to detect spurious
SIGTRACE if the value is not set. This solution has numerous deficiencies:
* It is platform specific. On PowerPC, for example, the kernel does
not, and strace has no way of telling the two cases apart.
* It is non-reliable. The check can only be made on the syscall
enter hook, not the exit hook.
* It relies on internal kernel behavior
* It is easy to fool by a malicious programmer. For example, send
the signal from another process, have the first process do a tight
loop where EAX (or whatever) is set to -ENOSYS, and strace will
think you have entered a random system call, probably the last one
again. Do that right after a fork or an exec, and all sorts of fun
stuff will happen.
Since I'm aiming to use the fakeroot-ng technology for security related
stuff (not in fakeroot-ng - I intend to split the project), these
drawbacks are fatal.
Don't get me wrong. I think cleaning up the debugger interfaces inside
the kernel is an excellent idea. I just don't think breaking user space
compatibility over the old interface, broken though you might think it
is, is justified. This is directed not so much against the utrace
project as it is against RedHat including it in production kernels.
Shachar
[1] - http://www.redhat.com/archives/utrace-devel/2009-March/msg00112.html
--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.comhttp://www.redhat.com/archives/utrace-devel/2009-March/msg00112.html