I have a DRM master implementing a purpose-built compositor for a dedicated 
use-case. It drives several different connectors, each on its own vsync cadence 
(there's no clone mode happening here).

The goal is to have commits to each connector occur completely without respect 
to whatever is happening on the other connectors. There's a different thread 
issuing the DRI ioctl's for each connector.

In the compositor, each connector is treated like its own little universe; a 
disjoint set of CRTCs and planes is earmarked for use by each of the 
connectors. One intention for this is to avoid sharing resources in a way that 
would introduce implicit synchronization points between the two connector's 
event loops. So, atomic commits made to one connector never attempt to use a 
resource that's ever been used in a commit to a different connector. This may 
be relevant to a question I'll ask a bit later below about resource locking 
contention.

For some time, I've been noticing that even test-only atomic commits done on 
connector A will sometimes block for many frame-times. Analysis with the DRI 
driver implementor has shown that the atomic commits to A--whether 
DRM_MODE_ATOMIC_TEST_ONLY or DRM_MODE_ATOMIC_NONBLOCK--are getting stuck in the 
ioctl entry code waiting for a DRI mutex.

It turns out that during these unexpected delays, the DRI driver's commit 
thread holds that mutex while servicing a commit to connector B. It does this 
while it waits for the fences to fire for all framebuffer IDs referred to by 
the pending connector B scene. So the commit to connector A can't be tested or 
enqueued until the commit to B is completely finished. The driver author 
reckons that this is unavoidable because every DRM_IOCTL_MODE_ATOMIC ioctl  
needs to acquire the same global singleton DRM connection_mutex in order to 
query or manipulate the connector.

The result is that it's quite difficult to guarantee a framerate on connector 
A, because unrelated activity performed on connector B can hold global locks 
for an unpredictable amount of time.

The first question would be: does this story sound consistent? If so, then a 
couple more questions follow.

Is this kind of implicit interlocking expected? Is there any way to avoid the 
pending commits getting serialized like that on the kernel side?

Thanks
-Matt

Reply via email to