> Date: Thu, 30 Jul 2009 16:09:22 -0600 > From: Neil Perrin <Neil.Perrin at Sun.COM> > To: Pawel Jakub Dawidek <pjd at FreeBSD.org> > Cc: zfs-code at opensolaris.org > Subject: Re: [zfs-code] Argument allocated on stack passed to taskq. > Message-ID: <4A721A12.8070404 at Sun.COM> > Content-Type: text/plain; CHARSET=US-ASCII; format=flowed > > > > On 07/30/09 15:39, Pawel Jakub Dawidek wrote: >> On Thu, Jul 30, 2009 at 11:31:41PM +0200, Pawel Jakub Dawidek wrote: >>> On Thu, Jul 30, 2009 at 11:25:19PM +0200, Pawel Jakub Dawidek wrote: >>>> Hello. >>>> >>>> In the traverse_impl() function we can find this call: >>>> >>>> ? ? if (!(flags & TRAVERSE_PREFETCH) || >>>> ? ? ? ? 0 == taskq_dispatch(system_taskq, traverse_prefetch_thread, >>>> ? ? ? ? &td, TQ_NOQUEUE)) >>>> ? ? ? ? ? ? pd.pd_exited = B_TRUE; >>>> >>>> Which should call the traverse_prefetch_thread() function with td >>>> argument from a separate thread. This doesn't look safe, as td is >>>> allocated on the stack at the begining of traverse_impl() and won't be >>>> accessible from taskq thread. >>>> >>>> Is my understanding correct? >>> Actually it should be fine, unless kernel thread stacks are swapable in >>> Solaris. >> >> Ok, I'm sending mails too fast. If traverse_impl() will return before >> task is complete, td can be overwritten by entering another function. >> So it cannot return before task is completed or there is a bug? > > It looks safe to me: > traverse_impl() waits on pd_exited() under protection of pd_mtx before > exiting. > Meanwhile traverse_prefetch_thread() will set pd_exited under pd_mtx and > broadcast > before exiting.
Absolutely. At the same time, backward synchronization (between prefetch thread and main traverse thread) implementation seems to have a side effect of extra cv_bcast on every block read, if my interpretation of lines 282-287 below is correct. Hardly a bug, but takes few extra cycles. Regards, Andrey 267 static int 268 traverse_prefetcher(spa_t *spa, blkptr_t *bp, const zbookmark_t *zb, 269 const dnode_phys_t *dnp, void *arg) 270 { 271 struct prefetch_data *pfd = arg; 272 uint32_t aflags = ARC_NOWAIT | ARC_PREFETCH; 273 274 ASSERT(pfd->pd_blks_fetched >= 0); 275 if (pfd->pd_cancel) 276 return (EINTR); 277 278 if (bp == NULL || !((pfd->pd_flags & TRAVERSE_PREFETCH_DATA) || 279 BP_GET_TYPE(bp) == DMU_OT_DNODE || BP_GET_LEVEL(bp) > 0)) 280 return (0); 281 282 mutex_enter(&pfd->pd_mtx); 283 while (!pfd->pd_cancel && pfd->pd_blks_fetched >= pfd->pd_blks_max) 284 cv_wait(&pfd->pd_cv, &pfd->pd_mtx); 285 pfd->pd_blks_fetched++; 286 cv_broadcast(&pfd->pd_cv); 287 mutex_exit(&pfd->pd_mtx); 288 289 (void) arc_read_nolock(NULL, spa, bp, NULL, NULL, 290 ZIO_PRIORITY_ASYNC_READ, 291 ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE, 292 &aflags, zb); 293 294 return (0); 295 }