> Date: Thu, 30 Jul 2009 16:09:22 -0600
> From: Neil Perrin <Neil.Perrin at Sun.COM>
> To: Pawel Jakub Dawidek <pjd at FreeBSD.org>
> Cc: zfs-code at opensolaris.org
> Subject: Re: [zfs-code] Argument allocated on stack passed to taskq.
> Message-ID: <4A721A12.8070404 at Sun.COM>
> Content-Type: text/plain; CHARSET=US-ASCII; format=flowed
>
>
>
> On 07/30/09 15:39, Pawel Jakub Dawidek wrote:
>> On Thu, Jul 30, 2009 at 11:31:41PM +0200, Pawel Jakub Dawidek wrote:
>>> On Thu, Jul 30, 2009 at 11:25:19PM +0200, Pawel Jakub Dawidek wrote:
>>>> Hello.
>>>>
>>>> In the traverse_impl() function we can find this call:
>>>>
>>>> ? ? if (!(flags & TRAVERSE_PREFETCH) ||
>>>> ? ? ? ? 0 == taskq_dispatch(system_taskq, traverse_prefetch_thread,
>>>> ? ? ? ? &td, TQ_NOQUEUE))
>>>> ? ? ? ? ? ? pd.pd_exited = B_TRUE;
>>>>
>>>> Which should call the traverse_prefetch_thread() function with td
>>>> argument from a separate thread. This doesn't look safe, as td is
>>>> allocated on the stack at the begining of traverse_impl() and won't be
>>>> accessible from taskq thread.
>>>>
>>>> Is my understanding correct?
>>> Actually it should be fine, unless kernel thread stacks are swapable in
>>> Solaris.
>>
>> Ok, I'm sending mails too fast. If traverse_impl() will return before
>> task is complete, td can be overwritten by entering another function.
>> So it cannot return before task is completed or there is a bug?
>
> It looks safe to me:
> traverse_impl() waits on pd_exited() under protection of pd_mtx before 
> exiting.
> Meanwhile traverse_prefetch_thread() will set pd_exited under pd_mtx and 
> broadcast
> before exiting.

Absolutely. At the same time, backward synchronization (between
prefetch thread and main traverse thread) implementation seems to have
a side effect of extra cv_bcast on every block read, if my
interpretation of lines 282-287 below is correct.  Hardly a bug, but
takes few extra cycles.

Regards,
Andrey

    267 static int
    268 traverse_prefetcher(spa_t *spa, blkptr_t *bp, const zbookmark_t *zb,
    269     const dnode_phys_t *dnp, void *arg)
    270 {
    271         struct prefetch_data *pfd = arg;
    272         uint32_t aflags = ARC_NOWAIT | ARC_PREFETCH;
    273
    274         ASSERT(pfd->pd_blks_fetched >= 0);
    275         if (pfd->pd_cancel)
    276                 return (EINTR);
    277
    278         if (bp == NULL || !((pfd->pd_flags & TRAVERSE_PREFETCH_DATA) ||
    279             BP_GET_TYPE(bp) == DMU_OT_DNODE || BP_GET_LEVEL(bp) > 0))
    280                 return (0);
    281
    282         mutex_enter(&pfd->pd_mtx);
    283         while (!pfd->pd_cancel && pfd->pd_blks_fetched >= 
pfd->pd_blks_max)
    284                 cv_wait(&pfd->pd_cv, &pfd->pd_mtx);
    285         pfd->pd_blks_fetched++;
    286         cv_broadcast(&pfd->pd_cv);
    287         mutex_exit(&pfd->pd_mtx);
    288
    289         (void) arc_read_nolock(NULL, spa, bp, NULL, NULL,
    290             ZIO_PRIORITY_ASYNC_READ,
    291             ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE,
    292             &aflags, zb);
    293
    294         return (0);
    295 }

Reply via email to