Dear Roland and dear utrace developers,

please help me. Either I have not understood the meaning of UTRACE_STOP
or it is completely useless due to a race condition.

There are always two entities in a utrace interaction: the traced
process and the tracing module.

When a traced event occurs in the traced process the correspondent 
report function gets called in the module.

If the report function returns UTRACE_STOP the traced process stays in a
quiescent state and the module wakes it up by a 
utrace_control(...,UTRACE_RESUME) call *later*.

This *later* is the problem.

If the module wakes the traced process too quickly, utrace has not yet put
it into a "stopped" state, therefore UTRACE_RESUME gets lost.
As a consequence, the execution is blocked.

IMHO, given the current utrace code, there is no way to set up some kind
of synchronization in the module to prevent this error.

-------

For the sake of simplicity let us assume one engine attached to the
traced process (the problem is the same for more engines).

The point is: when a report function returns UTRACE_STOP and later calls
utrace_control(...,UTRACE_RESUME) the traced process must not stop

t=0: Before the report function calling loop utrace->stopped=0;
     (In start_report: BUG_ON(utrace->stopped);)
t=1: REPORT FUNCTION CALL(no lock!):
t=2: When the report function returns UTRACE_STOP
     In finish_callback:
t=3: spin_lock(&utrace->lock);
     mark_engine_wants_stop(engine);
     spin_unlock(&utrace->lock);
t=4: in utrace_stop(..):
           spin_lock(&utrace->lock);
           utrace->stopped=1;
           __set_current_state(TASK_TRACED);
           spin_unlock(&utrace->lock);
           schedule(); --> now the traced process is blocked.

The module has "decided" UTRACE_STOP at t=1, then the module can call
utrace_control(...,UTRACE_RESUME) at any t>1.
If the resume call takes place before t=4 the request is lost and
the race condition causes the traced process to stop anyway.
In fact for 1<t<4 utrace_control finds that the process has not been
stopped:
     resume = utrace->stopped;
                 ...
and therefore it does nothing.
         /*
          * Let the thread resume running.  If it's not stopped now,
          * there is nothing more we need to do.
          */
        if (resume)
                utrace_reset(target, utrace, NULL);
        else
                spin_unlock(&utrace->lock);

-----
There are two solutions:

1- (slow & dirty): some sort of synchronization: no ptrace_control (or
  ptrace_set_events) should take place during all the sequence including
  from the report function call to the utrace->stopped=1.

2- (the nice one): add another flag named ENGINE_RESUME (like ENGINE_STOP).
  that flag must be cleared before calling the report function:
  t=0.5: clear_engine_wants_resume(engine);

  utrace_control(...,UTRACE_RESUME) should set the flag:
                spin_lock(&utrace->lock);
                mark_engine_wants_resume(engine);
                spin_unlock(&utrace->lock);
         
  utrace_stop at t=4 (inside the lock) must check if the traced process has
  been already resumed.
  spin_lock(&utrace->lock);
  spin_lock_irq(&task->sighand->siglock);
  /* final check: is really needed to stop? */
  list_for_each_entry_safe(engine, next, &utrace->attached, entry) {
          if ((engine->ops != &utrace_detached_ops) && 
engine_wants_stop(engine)) {
                  if (engine_wants_resume(engine))
                          clear_engine_wants_stop(engine);
                  else
                          utrace->stopped = 1;
          }
  }
  if (unlikely(!utrace->stopped)) {
          spin_unlock_irq(&task->sighand->siglock);
          spin_unlock(&utrace->lock);
          return false;
  }

  In this way the race condition should be eliminated.
  (it was eliminated in my proof-of-concept utrace patched implementation)
  If utrace_stop discovers that a resume request is already pending
  the traced process is not blocked.

-----
Ptrace on utrace works because there is a workaround: 
the notification to the ptracer is called from within the utrace_stop
function *after utrace->stopped has been set*.
Ptrace would suffer from the same race condition otherwise.

I am looking forward to hearing some comments on this. From what I see,
Kmview cannot be implemented on the current utrace implementation.

renzo

Reply via email to