Dear Roland and dear utrace developers, please help me. Either I have not understood the meaning of UTRACE_STOP or it is completely useless due to a race condition.
There are always two entities in a utrace interaction: the traced process and the tracing module. When a traced event occurs in the traced process the correspondent report function gets called in the module. If the report function returns UTRACE_STOP the traced process stays in a quiescent state and the module wakes it up by a utrace_control(...,UTRACE_RESUME) call *later*. This *later* is the problem. If the module wakes the traced process too quickly, utrace has not yet put it into a "stopped" state, therefore UTRACE_RESUME gets lost. As a consequence, the execution is blocked. IMHO, given the current utrace code, there is no way to set up some kind of synchronization in the module to prevent this error. ------- For the sake of simplicity let us assume one engine attached to the traced process (the problem is the same for more engines). The point is: when a report function returns UTRACE_STOP and later calls utrace_control(...,UTRACE_RESUME) the traced process must not stop t=0: Before the report function calling loop utrace->stopped=0; (In start_report: BUG_ON(utrace->stopped);) t=1: REPORT FUNCTION CALL(no lock!): t=2: When the report function returns UTRACE_STOP In finish_callback: t=3: spin_lock(&utrace->lock); mark_engine_wants_stop(engine); spin_unlock(&utrace->lock); t=4: in utrace_stop(..): spin_lock(&utrace->lock); utrace->stopped=1; __set_current_state(TASK_TRACED); spin_unlock(&utrace->lock); schedule(); --> now the traced process is blocked. The module has "decided" UTRACE_STOP at t=1, then the module can call utrace_control(...,UTRACE_RESUME) at any t>1. If the resume call takes place before t=4 the request is lost and the race condition causes the traced process to stop anyway. In fact for 1<t<4 utrace_control finds that the process has not been stopped: resume = utrace->stopped; ... and therefore it does nothing. /* * Let the thread resume running. If it's not stopped now, * there is nothing more we need to do. */ if (resume) utrace_reset(target, utrace, NULL); else spin_unlock(&utrace->lock); ----- There are two solutions: 1- (slow & dirty): some sort of synchronization: no ptrace_control (or ptrace_set_events) should take place during all the sequence including from the report function call to the utrace->stopped=1. 2- (the nice one): add another flag named ENGINE_RESUME (like ENGINE_STOP). that flag must be cleared before calling the report function: t=0.5: clear_engine_wants_resume(engine); utrace_control(...,UTRACE_RESUME) should set the flag: spin_lock(&utrace->lock); mark_engine_wants_resume(engine); spin_unlock(&utrace->lock); utrace_stop at t=4 (inside the lock) must check if the traced process has been already resumed. spin_lock(&utrace->lock); spin_lock_irq(&task->sighand->siglock); /* final check: is really needed to stop? */ list_for_each_entry_safe(engine, next, &utrace->attached, entry) { if ((engine->ops != &utrace_detached_ops) && engine_wants_stop(engine)) { if (engine_wants_resume(engine)) clear_engine_wants_stop(engine); else utrace->stopped = 1; } } if (unlikely(!utrace->stopped)) { spin_unlock_irq(&task->sighand->siglock); spin_unlock(&utrace->lock); return false; } In this way the race condition should be eliminated. (it was eliminated in my proof-of-concept utrace patched implementation) If utrace_stop discovers that a resume request is already pending the traced process is not blocked. ----- Ptrace on utrace works because there is a workaround: the notification to the ptracer is called from within the utrace_stop function *after utrace->stopped has been set*. Ptrace would suffer from the same race condition otherwise. I am looking forward to hearing some comments on this. From what I see, Kmview cannot be implemented on the current utrace implementation. renzo