Here is a modified version [1]. Compile-tested only, but it gives an idea.
I realized that there was a problem in your patch: you give a kcov descriptor on a per-process basis, but if a proc has several threads you are mixing the coverage of each thread in the output buffer. This is not desired, so I changed the code to be per-LWP. Also the lookup in _trace_pc() was bad for two reasons: it is slow, and it is racy. The thing is, adding a mutex here is not allowed. So I dropped the lookup and made a per-LWP pointer. As a result of that, it unfortunately becomes more complicated to free the associated kcov descriptor, typically in corner cases such as: - A thread closes the file descriptor in the process, but another thread was being traced; in that case we ask the traced thread to free the kcov descriptor when it terminates - A traced thread is killed; in that case it should disable the kcov descriptor but not free it, and this descriptor will get subsequently freed when the process terminates Another concern (that I didn't address) is the fact that you use device units. It would seem better to me if it was not per-unit but per-process. Ie, when a process opens /dev/kcov, you register the pid in the kcov descriptor rather than the unit. Then you look up based on the pid, and you trace based on the thread. So, do as you see fit, my version shouldn't be too far from working, and it's here to give an idea. [1] https://m00nbsd.net/garbage/kcov/kcov.diff