2010/6/13 Roland McGrath <rol...@redhat.com>:
> Thanks for your interest in utrace.
>
> It's correct that passing UTRACE_REPORT to utrace_control should
> always ensure you get some report_* callback before the tracee returns
> to user mode.
>
> However, I think your use may be susceptible to a race between
> resumption by utrace_control, and the UTRACE_STOP done by the
> callback.  If your callback looks something like:
>
>        wake_up_consumer();
>        return UTRACE_STOP;
>
> then there is a window when the consumer can wake up and call
> utrace_control before that callback has actually returned UTRACE_STOP
> and had it processed.  In that event, your UTRACE_REPORT comes before
> the UTRACE_STOP, and then you do nothing afterwards to resume it.  If
> it ever did resume, it would indeed report (the UTRACE_REPORT is still
> "pending").  But that doesn't help you.
>
> Off hand, I think there is only one way to address this race.  When
> you get the notification to consume, before calling utrace_control,
> call utrace_barrier first.  This will block the consumer until the
> producer's callback has actually finished and had UTRACE_STOP recorded
> for your engine.  At that point, you can be sure that your
> utrace_control call will indeed wake up the stopped tracee.
>
> Oleg or others here can double-check my reasoning and help explain the
> situation if you share your code for them to see.
>
> Because of this possibility, it might make sense to have utrace_control
> return -EINPROGRESS for this situation.  That at least would alert you
> that something might be amiss.  Unlike other cases of -EINPROGRESS, this
> would not just mean that you need to call utrace_barrier to be sure the
> effect has completed.  Rather, it means there is a danger of the
> scenario I described above, where the effect you asked for is going to
> be trumped by the callback's UTRACE_STOP.  So it doesn't quite fit in
> with the other situations--it's not really an indication that you have
> to follow up with a synchronization, it's an indication that your
> synchronization logic is already wrong.  But at least it would relieve
> the mystery of such a situation, assuming you are checking return values
> and noticing in debugging.  I'm certainly open to opinions on this API
> change.
>
> The need to use utrace_barrier as a second synchronization after the
> main wake-up (done by whatever normal means you are using) is rather
> clunky, and when it matters (as we suspect it does in the occasional
> race in your case) it leads to ping-pong scheduling.  So we would like
> to improve the utrace API in this area at some point.  One idea we have
> discussed before is a "utrace_wake_up" call.  (This assumes that some
> variant of <linux/wait.h> wake_up* is what you are using in the guts of
> "wake_up_consumer".)  This would replace wake_up* for use in a callback,
> and simply do the wake-up after all the callbacks are finished (or
> perhaps just yours, but probably all), so your return value action is
> already recorded.  In fact, it might do the wakeup only after the
> UTRACE_STOP is not only recorded but actually performed, so that by the
> time your consumer wakes up, the tracee is actually stopped and
> available for third-party examination.  (You don't actually care about
> that for your case--for your purposes, it would be optimal just to delay
> the wakeup until your own callback's return value is recorded.  That
> way, if your consumer acts quickly enoug on another CPU, it might even
> resume the tracee before it actually yields the processor.)
>
> As you might tell from the complexity of that paragraph, there are many
> fuzzy nuances of exactly what that better API might be.  (And I didn't
> even delve into the possibility that you're using some synchronization
> mechanism other than wake_up* calls on a wait_queue_head_t.)  Hence we
> haven't tried to work it out exactly yet.  We hope that seeing a variety
> of uses such as yours will help us figure out how best to improve the
> utrace API for tracee-callback-to-tracer synchronization handshakes.
>
>
> Thanks,
> Roland
>

Sorry for the long delay. Here's my scenario for better understanding.

The buffer that's used for communication is in essence a circular
buffer,  but instead of  the buffer wrapping, the consumer is
suspended.

 I have  *a single* consumer at any time.  The consumer waits for data
to be available. The code looks like this:

set_current_state(TASK_INTERRUPTIBLE);
while (atomic_read(&buf->tail) == atomic_read(&buf->head))
 {
     pc_debug("Reader task going to sleep!");
     schedule();
     if (signal_pending(current)) {
         return -ERESTARTSYS;
     }
 }
set_current_state(TASK_RUNNING);

once data is available it read some , freeing up some space. After
that it  wakes up all producers. Producers are utrace monitored tasks.
 I use a  standard linked list  protected by a spin lock to protect it
to hold references   to the  utrace engine and task structures, that i
need for utrace control.  Is it possible to use a wait_queue for this
? I know it can be used for tasks in general, but i think that would
bypass  utrace to suspend a task, that is not recommended.

so the consumer does something like

foreach (utrace engine list *wake = unqueue_producer(list) ) {
   utrace_control(wake->task,wake->engine,UTRACE_REPORT)
}

unqueue_producer aquires the spinlock that protectes the list.

The producer is a utrace monitored task. At any time there is an
arbitrary number of producers. On sys_entry and sys_exit the producers
do  :

slot = book_message_in_buffer(buf);
if (slot < 0) {
  /* no slot can be alocated, producer must sleep  */
  enque_producer_on_wait_list(list, producer);
  return UTRACE_STOP;
}

This is in fact the first thing they do. book_message_in_buffer
reserves space in the buffer. If the buffer is full it returns   a
negative  value. Then the process puts itself in the wait list. Again
the locking of the list is handled  in enqueue_producer. In addition
the buffer is write protected by a spin lock.  The most work in
book_message_in_buffer is in a critical section protected by the spin
lock.  (just to make it clear, there are 2 spin locks, one for the
lists, and one for the data).

After producing some data, the producer  does :

if (buf->reader != NULL) {
         pc_debug("Setting the reader task to runnable");
         wake_up_process(buf->reader);
     }

to wake up the consumer, in case it's put to sleep.

To summarize,

the consumer does :

1  sleep until data is available

2  consume data

3  while producers are queued, acquire list lock
4  utrace_control(REPORT) the task
5. release lock
6. repeat from 3

a producer:

1. Acquire data lock
2. try to  reserve the space in the buffer
3. release the lock
4. If no space  get the list lock
5. queue myself
6. release the list lock

7. produce some data

8.    get the data lock
9.    commit the data
10.  notify the reader
11.  release the data lock.

>From my printk-s i can see producers and  consumers going to sleep and
wakening up, as well as  the monitoring callbacks being called again
after the utrace report.  Then, whiteout a reason i can infer, the
callback isn't called again, and the task remains blocked.  This also
happens on a single processor machine with a single producer (no high
concurrency scenario).  Likelihood of the problem seems to increase
with the number of tasks, and processors.

If my understanding is correct, my use case is different from what you
suspect, as in i wake up the customer if i resume the producer, not if
i stop it.  This still leaves room for a race  if the consumer is not
blocked,
but the producer is about to block, added itself to the list, but the
utrace_stop is not yet processed when the consumer
utrace_control(report)-s it .  I am not sure if this race ever
happens, since the consumer only goes to wake up producers if space is
available, in which case the producers would not block in the first
place.  It probably can happen because a race can also occur here, but
somehow  i don't think that's it, because i should have been able to
detect something like this from the printk-s   . Anyway, just to make
sure i guess it would be enough to see if i can reproduce the problem
with preemption disabled on a uniprocessor machine.  It doesn't help
that the synchronization of the buffer is rather complex, but it
allows for lockless reads.

On another thought, i think it would be  good   for utrace_control to
signal something if it can detect that the requested operation has no
effect or it might have no effect. It would save some trouble.



Alpar Torok

Reply via email to