> Date: Tue, 19 May 2026 21:10:08 +0000 > From: Taylor R Campbell <[email protected]> > > 2. We already have a few kernel APIs for algorithms of this type: > [...] > But what they all have in common is that waiting for a safe time to > free is a synchronous blocking operation: pserialize_perform, > psref_target_destroy, localcount_drain. That's adequate for some > purposes, but for others, it would be nice to gather moribund > resources in batches to free asynchronously.
On reflection, I realize this can't be right, because a pool_cache(9) with PR_PSERIALIZE should already free batches of objects, although each time it chooses to free a single batch, it holds up the caller to wait for an xcall. So the difference is presumably in: - the cost of read sections (higher from additional barriers and bookkeeping), vs - the time from deletion to freeing (perhaps higher because there's no immediate feedback from xcall), or how much memory can grow due to batches not yet proven freeable, vs - the latency of freeing a single batch (maybe lower on average, if we can find a batch that can be safely freed without synchronously waiting for an xcall or equivalent? dunno if this makes sense), vs - the other computational cost of processing each batch (lower because there's no xcall costing cycles and cache disruption on all CPUs). I would be curious to see some quantitative visualization of how this affects practical workloads, e.g. the difference between the same code using pserialize_read_enter/exit and pool_cache PR_PSERIALIZE vs smr_(lazy_)enter/exit and pool_cache_set_smr. I'm also curious to see how smr_enter/exit compares to an ordinary reader/writer lock, because membar_sync is expensive! Incidentally, there is probably low-hanging fruit for reducing the cost of pserialize_perform under load -- the current algorithm is about as naive as it gets: every pserialize_perform triggers a broadcast xcall. If one pserialize_perform is in progress waiting for an xcall when two more pserialize_performs are requested, we could probably safely serve those requests by a single additional xcall, rather than additional two xcalls, after the first xcall has completed.
