On 17 Apr 2015, at 12:49, Martin Pieuchot <m...@openbsd.org> wrote:

> On 16/04/15(Thu) 15:24, kanonenvogel....@gmail.com wrote:
>> Well, lets begin.
>> 
>> In the future, I wish to have fd_getfile() returning acquired fp instance.
>> The goal is to not to have pointer to destroyed fp instance
>> in FREF()/FRELE()/fd_getfile() races. This one requres modification of
>> getsock(), getvnode() and dupfdopen() functions, they must receive pointer to
>> struct proc instance for FRELE() call on referenced fp instance while they
>> have internal error cases. While getsock(), getvnode() and dupfdopen()
>> functions are called, "struct proc" instance exists, so their
>> "struct filedesc *" arg can be replaced by "struct proc *" arg which contains
>> pointer to "struct filedesc”.
>> 
>> The races will be appeared right after at least one FRELE(), FREF() or
>> fd_getfile() call will be done outside kernel lock. The “outside kernel 
>> lock" call
>> capability requires a little more refactoring, but for this functions only, 
>> not
>> system-wide.
>> 
>> Now we have something like:
>> 
>> if((fp = fd_getfile(fds, fd)) == NULL)
>>       goto error;
>> 
>> /*
>> * fp can be destroyed here by FRELE() call on other cpu
>> */
>> 
>> FREF(fp);
>> 
>> The goal is to avoid this situation.
> 
> I came to the same conclusion when I wrote these diffs.  I think the
> first 3 are good and simple enough to be committed, somebody else agree?
> 
> The fourth one that move FREF() inside fd_getfile() is IMHO incomplete.
> As you can see I putted some XXX where the existing code was calling 
> fd_getfile() without incrementing the reference count.  Why did you
> decide to delete them?
Well, my version of fd_getfile() requires “struct proc *” arg for proper
funref() call within. So my original patchset has my own modification of
getsock(), getvnode() and dupfdopen(). They has “struct filedesc *” and 
“struct file *” arguments together, but the only “struct file *" is enough.
And I used your version. :)

At first i modified fd_getfile() internals for acquisition/release races 
on smp machine. Atomic increment of f_count is required in smp case.

Kernel sources has the strange and wrong magic with f_count field. closef()
wants fp->f_count to be >= 2, and closef()’s callers do this magic for panic
prevention. falloc() and FILE_SET_MATURE() do this magic too. Now,
newly created “struct file” instance has f_count == 2. It’s wrong, and below I
describe situation without this magic.

Minimal f_count value of existing “struct file” instance is 1.
Minimal f_count value of existing and acquired “struct file” instance is 2.
Minimal f_count value of acquired “struct file” instance is 1.

“existing” means “exists in “struct filedesc” instance and can be obtained by
fd_getfile() call”.
“acquired” means “returned by fd_getfile() call”. 
“struct filedesc” instance holds existing “struct file” instances.

fdalloc() creates and places “struct file” instance to “struct filedesc” 
container
fdrelease() removes “struct file” instance from “struct filedesc”, but delete it
only if f_count is 1 (fp exists and has no references). fp instance can be
shared between multiple processes and can be shared by multiple
“struct file” correlated syscalls of one multithreaded process. So, we have
races and situation can be: “fp instance” must be removed from
“struct filedesc” but kept for current “struct file” instance users. Last of 
them
will destroy it by FRELE() call. If we don’t increment f_count within 
fd_getfile(),
we can return pointer to already destroyed “struct file” instance. For example
one thread calls fd_getfile() and other calls fdrelease(). If fd_getfile() 
caller
wins, f_count is at least 1 and instance is not destroyed, but removed from
“struct filedesc” container and can’t be obtained by next fd_getfile() call. if
fdrelease() wins it removes and destroys “struct file” instance and fd_getfile()
will return NULL.

the real fucntions must be:

struct file *
fd_getfile(struct filedesc *fdp, int fd, struct proc *p)
{
#ifdef MULTIPROCESSOR
        struct file *fp;
        unsigned long count;

        if(fd <0 || fd >= fdp->fd_nfiles)
                return NULL;

restart:
        if ((fp = fdp->fd_ofiles[fd]) == NULL)
                return NULL;
        if ((count = fp->f_count) == 0)
                return NULL;
        if (atomic_cas_ulong(&fp->f_count, count, count + 1) != count)
                goto restart;
        if ((fp != fdp->fd_ofiles[fd])) {
                funref(fp, p);
                goto restart;
        }
        if (!FILE_IS_USABLE(fp)) {
                funref(fp, p);
                return (NULL);
        }

        return fp;
#else
        struct file *fp;

        if(fd <0 || fd >= fdp->fd_nfiles)
                return NULL;
        if ((fp != fdp->fd_ofiles[fd]))
                return NULL;
        if (!FILE_IS_USABLE(fp))
                return NULL;
        fp->f_count++;

        return fp;
#endif
}

/*
* FREF() replacement, must be called on
* _acquired_ fp only
*/
static __inline void
fref(struct file *fp)
{
#ifdef MULTIPROCESSOR
        atomic_inc_long(&fp->f_count);
#else
        fp->f_count++;
#endif
}

/* FRELE() replacement */
static __inline int
funref(struct file *fp, struct proc *p)
{
#ifdef MULTIPROCESSOR
        if (atomic_dec_long_nv(&fp->f_count) == 0)
                return fdrop(fp, p);
#else
        if ((--fp->f_count) == 0)
                return fdrop(fp, p);
#endif
        return 0;
}

I will try to explain the code above.

On uniprocessor machine, syscall can not be removed from cpu by
other syscall, so non atomic increment and decrement is enough.

On multiprocessr machine we have races.

/*
* The race with funref(), if count == 0 funref() won,
* don’t use this fp instance, it will be removed
*/
if ((count = fp->f_count) == 0)
        return NULL;

Actually, the chain is:

count = fp->f_count; /* this assignment is atomic */
/*
* fp->f_count can be modified here, funref() can make it 0
* and start fp instance removal.
*/
if(count == 0)
        return NULL;

So if we have something like this
if (atomic_inc_long_nv(&fp->f_count) == (count+1)){
        /* YOU WON */
}

The situation can be

/*
* The races between atomic_inc_long_nv() and
* atomic_dec_long_nv()
*/

/*on cpu N */
count = fp->f_count; /* count is 1 */

/*
* on cpu N+1 
* atomic_dec_long_nv() call
* fp->f_count is 0, fp instance is destroyed
*/

/*
* on CPU N
* count is not 0
*/
f(count == 0)
        return NULL;

/* 
* on CPU N+2
* (atomic_inc_long_nv(&fp->f_count)
* fp->f_count is 1 now, but fp instance is destroyed 
* /

/*
* on CPU N
*/
if (atomic_inc_long_nv(&fp->f_count) == (count+1)){
        /* you won */
}

/*
* True, but we acquired destroyed "struct file” instance
*/

The right way is to use atomic_cas_ulong() for increment.
The races are only between atomic_cas_ulong() and
atomic_dec_long_nv(). If fp->f_count is modified on other cpu
beetween "count = fp->fcount” and atomic_cas_ulong() call
"if (atomic_cas_ulong(&fp->f_count, count, count + 1) != count)"
will be true and we don’t acquire destroyed “struct file instance”.

If I could not explain I will retry :)

> -current is where development happens :)
I used stable for demonstartion, not for patches. :)
I will upgrade to -current, sure.

> Don't flood us ;)  But the next time, one diff *inline* with a clear
> subject message is the preferred way so we can simply use our mailbox
> to do peer review.
Ok, I must unrerstand the content of this patch. It can be clean only
after discussion. f_count magic must be removed and filelist must be
protected with fd_getfile() modification too. I think, The patch must
contain modifications up to marking syscalls as SY_NOLOCK.

> Regarding your previous question about the first diff I sent you:
>> I don't understand, why we need to check flags inside fd_getfile
> 
> You don't *need* to, but doing so will remove a lot of gotos the
> functions calling fd_getfile().  I'm aware it's not strictly necessary
> but compare (without diff):

> Of course all the functions do not return EBADF when the mode is
> incorrect, but I think the pattern is spread enough for this change
> to be worth it.
Well, f_type can also be checked within fd_getfile(), the same case :)
I suppose, fd_getfile() returns acquired “struct file *” instance, and this
the only case. All other checks must be done outside fd_getfile().


Reply via email to