> Date: Sun, 23 Jul 2023 22:57:14 +0900 > From: PHO <p...@cielonegro.org> > > On 7/23/23 21:24, Taylor R Campbell wrote: > > Why can't the loop just use dma_fence_get_rcu and dma_fence_put? > > > > Might need to release and reacquire the lock around dma_fence_put. > > I tried but it didn't work, apparently because someone was holding a > pointer to a dying fence and waiting for it to be signaled. Using > dma_fence_get_rcu() meant we didn't signal such fences, and the kernel > hanged and eventually got killed by heartbeat().
What exactly did you try and what was the heartbeat panic stack trace? Nothing here should wait for signalling in a way that trips the kernel heartbeat monitor -- that should only happen if you hold onto a spin lock for too long. (That said: yay, the heartbeat monitor is catching what would otherwise be a silent nonresponsive hang!) Thinking about it some more, I see there is a snag with the loop and dma_fence_put, since (a) we can only use list_next(&fence->head) if we read it and then use it under the lock, but (b) we can only dma_fence_put outside the lock because the release callback, vmw_fence_obj_destroy, takes the lock, so there's no way to do the iteration safely as is with an extra reference count acquire/release cycle. Two options come to mind: 1. Move vmw_fence_obj_destroy to an `RCU' callback so that we can safely do dma_fence_put under the lock. 2. Create a list on the stack of signalled fences to put in __vmw_fences_update and put them after the loop, outside the lock: struct list_head to_put = LIST_HEAD_INIT(to_put); ... list_for_each_entry_safe(fence, next_fence, &fman->fence_list, head) { if (dma_fence_get_rcu(fence) != 0) { /* (*) */ list_del_init(&fence->head); continue; } if (seqno - fence->base.seqno < VMW_FENCE_WRAP) { list_del_init(&fence->head); list_add(&fence->head, &to_put); /* (*) */ dma_fence_signal_locked(&fence->base); ... } break; } ... if (fence != NULL) { spin_unlock(&fman->lock); dma_fence_put(fence); spin_lock(&fman->lock); } while ((fence = list_first(&to_put)) != NULL) { list_del_init(&fence->head); spin_unlock(&fman->lock); dma_fence_put(fence); spin_lock(&fman->lock); } > > If that doesn't work for some reason and we do have to add a special > > case, I'd strongly prefer to add a new front end like > > dma_fence_signal_vmwgfxspecialhack or something (other examples in the > > tasklet API) that has the relaxed assertion instead of relaxing the > > assertions for every other driver which isn't bonkers. > > Alright. I'll do that. This should be a last resort (but not as bad as relaxing the assertions) if we can't figure out a better way to do it. I'm not sure the patch above is going to be best (if it works), but I'd like to see if there's a way to do this without abusing the API semantics too much before giving in to the abuse.