On 20/01/20(Mon) 07:19, Mark Kettenis wrote:
> > [...] 
> > I'd appreciate particular review of the following items:
> > 
> >  * Event producer/consumer code which currently needs a mutex.  The
> >    current implementation doesn't always use a PCB per-CPU.  Moving
> >    to a lockless implementation would be beneficial for recording in
> >    deeper places of the kernel.
> 
> I think that should be possible, but I'd have to think about that for
> a bit longer.  Shouldn't prevent this from going in.  And it might
> mean we can't support hppa MP support kernels (but that's not a big
> issue).

The current kernel atomic_* API should be good enough to make such code
usable on hppa.  Obviously, tracing a high number of events might result
in contention leading to the loss of some of them.

The current number of events read & dropped during a btrace(8) recording
is displayed when using '-v'.  For example capturing events every time a
thread gets executed, like in the following, usually result in events
being dropped.  This happens because the rings are consciously undersized
to force the developers to revisit this part of the code once we figure 
out how much we can allocate per-PCB for real use cases.

btrace -ve \
   'tracepoint:sched:on__cpu { printf("%u: run %s(%d)\n",nsecs, comm, pid); }'

...
579539503551390402: run softnet(80542)
^C4747 events read
13 events dropped

> >  * Barriers for enabling/disabling probes.  dt(4) can be opened multiple
> >    times allowing multiple processes to gather event in parallel.  I'd
> >    like to be sure a close(2) doesn't stop another thread from capturing
> >    events.
> 
> Is it absolutely neccessary that we support multiple processes gathering?

It isn't.  However after so many years of MP development I'm still
unable to write such code.  That's why I'm asking for help as I believe
this knowledge might be useful to me and many of us. 

> >  * Read wakeup.  Currently a mutex is used to not loose a wakeup event
> >    in dtread(), however this mutex creates lock ordering problems with
> >    the SCHED_LOCK() when probes are executed to profile the scheduler.
> 
> Hmm, isn't it possible to avoid this by using sleep_setup() and
> sleep_finish_all() manually?  I might give that a go once this is in.

Thanks.

> > I included changes to MAKEDEV on all archs where the dev number 30 is
> > still free.  I'll pick the closest number for others archs if that's ok. 
> > 
> > The man page doesn't include a list of ioctl(2) as I believe they will
> > change soon.  I'll send another mail for the userland interface.
> > 
> > Ok to continue in-tree?
> 
> Yes!

Thanks :)
 
> Is there a particular reason your changing db_expr_t into long in some
> of the places?  I'm not necessarily against removing that useless
> abstraction, but this seems to be only a partial change.

With the current changes from visa@ to move the stacktrace saving API
outside of DDB it is no longer necessary.  Why might also want to
introduce a variant of the API to be able to skip a fix number of frames
to not always include the ones required by the dt(4) code.  For example
on amd64 for the profile provider used for FlameGraphs:

      dt_prov_profile_enter+0x6e
      hardclock+0x12c
      clockintr+0x59
      intr_handler+0x6e
      Xresume_legacy0+0x1d3
      cpu_idle_cycle+0x1b     <---- start here.
      proc_trampoline+0x1c

Reply via email to