Re: resuming after stop at syscall_entry

Renzo Davoli Sat, 25 Apr 2009 05:17:05 -0700

> Enter Renzo Davoli.  

Here I am!


I have spent my time testing the latest version and trying to figure out
how to implement "nested Renzo's engines" with the support you propose.

Comments on the latest version of utrace:
-------------------------------------
1- syscall_entry report reversed.
wonderful, thank you. Now kmview.ko runs on vanilla utrace provided
KMVIEW_NEWSTOP is defined.
KMVIEW_NEWSTOP stops the process inside the syscall report function
so it is a undesirable workaround, not a solution.
Anyway this can be used as a proof-of-concept: the problem related to
the order of callbacks for syscall_entry is solved.
-------------------------------------
2- utrace_control(.., UTRACE_RESUME) can arrive too early, before
ENGINE_STOP is set (in engine->flags by mark_engine_wants_stop).

Let us name p the traced process and vm the tracer.
t=10: p reports a system call. 
     during the report function, p communicates with vm 
     the report function returns UTRACE_STOP
     utrace is unlocked during the report function.
t=20: p records its need to stop: 
      (lock) engine->flags |= ENGINE_STOP; (unlock)

later (time t' > 10) vm calls utrace_control(p, engine, ENGINE_RESUME):
if t' < 20 the request gets lost!
in fact:
t=15:   utrace_control gets the lock
        resume=utrace->stopped IS ZERO!
        clear_engine_wants_stopped clears ENGINE_STOP which has not been
                set yet
        at t=20 ENGINE_STOP is set and the task blocked.

There are two "clean" "non-baroque" approaches to solve this problem:
2A- interface approach: 
long time ago utrace had a utrace_set_flags call to set ENGINE_STOP flag 
before p communicates with vm. In this way ENGINE STOP will always 
be cleared after it has been set.
2B- implementation approach:
use two bits: ENGINE_STOP and ENGINE_RESUME.
before t=10 ENGINE_STOP and ENGINE_RESUME are unset.
utrace_control(p, engine, UTRACE_RESUME) must set ENGINE_RESUME and clear
ENGINE_STOP.
at t=20 p can check if there has been a fast resume request. In this case
ENGINE_STOP is not set.

It is possible to create other workarounds, barriers, fake reports, 
busy wait loops... If we want something effective, we must implement
solutions not workarounds. If a engine say UTRACE_STOP and later
UTRACE_RESUME, the task must be resumed. The simplest, the better.

My patch in:
http://view-os.svn.sourceforge.net/viewvc/view-os/trunk/kmview-kernel-module/kernel_patches/linux-2.6.29-patch1?revision=637&view=markup
implements 2B and works with the latest utrace implementation.
----------------------
Comments on the proposal.

Roland, let me say frankly that the repeated report scan for system call
is just a step towards a solution, but I do not like it so much.

Problem #1: when each engine receives the same syscall_entry report several
times, each engine must discover if:
- a previous engine has already stopped this task
  ( utrace_resume_action(action) == UTRACE_STOP)
- this is a repeated scan and the current engine has already processed this
  report (there is the risk to process it twice). 
- this is a real new report

Maybe I can keep the address of the engine which stopped
the task somewhere (say in a task private variable stopengine).  
During the repeated scan:
        - if stopengine is NULL is a fresh call.
        - else (stopengine != NULL) means that the current engine has 
                already processed this report
                - if stopengine == this engine then set stopengine to NULL.
A more portable approach follows (*) :
Each engine records if it stopped the task.
During the repeated scan:
        - if ! (action & UTRACE_SYSCALL_RESUMED) this is a fresh call
        - else the current engine has already processed this report
                - if this engine stopped the task then clear 
                        UTRACE_SYSCALL_RESUMED in the action returned.

This is not a nice solution: this "protocol" must be consistently applied
by all the modules using utrace otherwise they cannot interoperate.
If a report_syscall_entry does not behave in the same way it may receive
repeated reports or force other engines to skip some reports.

All the programmers of utrace modules should always agree on these 
details: not a good interface for a long term interoperability.

Problem #2: syscall exit may need to modify the return value/errno.
The need for stop&go at each engine applies not only to syscall_entry.


I really do not understand why is so unaccetable to have a UTRACE_STOP_NOW
tag to stop a process *before* reporting to the next engine.
The interface would be clean, interoperability between tracing and virtualizing
guaranteed.

It is not a matter of performance. If your engine need to see the 
system call that is going to be done by the kernel as you say:
        if (utrace_resume_action(action) == UTRACE_STOP)
                return UTRACE_REPORT
it has to wait all the virtualizers to have done their job any way.
On the other hand, this code cannot be used if you want to test which
system call appear to be done after the third virtualizer and before 
the fourth.

If you want to see the syscall arguments before someone else changes 
syscall argument values you propose:
        if (action & UTRACE_SYSCALL_RESUMED)
                return UTRACE_RESUME;
this simply does not work: either this is the last engine inserted or
some other engine may have already changed, or maybe is changing the
arguments concurrently, the result is unpredictable in this latter case.
The code is also foolished by the reset of UTRACE_SYSCALL_RESUMED
as in (*) above.
If you want to see the syscall arguments before someone else changes
simply insert the engine as the last one.
UTRACE_STOP_NOW is a general approach: it is possible to see the syscall 
arguments before the fourth virtualizer changes them but after the third 
virtualizer has already done its work.

Roland, in my opinion you are too concerned to solve the problem to
support the very first and very last engine that you are not 
seeing the problem to support the very fifth of fiftieth.
If hundreds of debuggers can run concurrently, they return UTRACE_STOP,
if a virtualizer must be certain of what happens before and after
it uses UTRACE_STOP_NOW.
In this way utrace provides a support for interoperability between
different modules, and there is no need for programmers to share
the same protocol dealing with nested engines.

ciao.
        renzo

Re: resuming after stop at syscall_entry

Reply via email to