On Wed, Nov 25, 2020 at 4:23 PM Scott Cheloha <[email protected]>
wrote:

> In stdio, which lock are you supposed to take first?  The global
> sfp_mutex or the per-FILE lock?
>
> In __sfp() we hold sfp_mutex while iterating through the pool (unsure
> what else to call it) of FILEs.  No two threads can modify the pool at
> the same time:
>
...

> Note that we set _flags to 1 to reserve it for the current thread
> before leaving sfp_mutex.  Note also that we don't take the per-FILE
> lock before reading each FILE's _flags.
>
> Then look at fclose(3):
>
...

> We check if _flags is zero without any lock.  I'm unsure if this is
> safe.
>
> However, we then clean up under the FILE's lock and set _flags to zero
> without sfp_mutex.
>
> ... that can't be right.
>
> So, what to do?  My immediate thought was to export sfp_mutex and
> enter it before writing _flags (diff attached).  But then the global
> sfp_mutex is "higher" in the locking hierarchy than the per-FILE lock.
> That doesn't seem quite right to me.
>
> We also modify _flags all over stdio without sfp_mutex, so the rule is
> inconsistent.
>
> Another possibility is to take the per-FILE lock when examining each
> FILE's _flags during __sfp().  That would be costlier, but then the
> hierarchy would be reversed.
>
> Thoughts?
>

 Let's say that we're willing to presume that changing _flags from
one non-zero value to another non-zero value will never result in
a zero value being visible either on this CPU or another one.  If
that's not true, then there's more to fix, but let's start with
that assumption.

Given that, I think the only unsafe item in what you described above
is the setting of _flags to zero in various places without either
holding sfp_mutex or using some sort of membar (or atomic op) to
guarantee all previous changes to the FILE are visible before the
flags change is visible.

My reasoning would be that if the setting of _flags from non-zero
to zero was always the last thing visible, then the code scanning
the list could be sure that a non-zero flags means no one else has
any pending writes to the FILE and it can be allocated.  __sfp()'s
setting _flags to 1 to mark it as allocated is made visible to other
threads when it unlocks sfp_mutex.

...but we don't have those membars/atomic-ops, so it's not currently
guaranteed that __sfp() can't allocate a FILE which is still being
updated by a thread that's releasing it.  ;(

If strewing membars makes people twitchy (my eye twitches some),
then yeah, your proposal to take sfp_mutex when zeroing _file is
te alternative.  Regarding the hierarchy concern, see below.


None of this fixes _fwalk(), which can invoke the callback on
partially created FILEs, even if it were to grab sfp_mutex.  I can
imagine a couple directions for fixing that, from setting __SIGN
on not-yet-completed FILEs and clearing it at the end, to full-blown
having __sfp() return a locked FILE and making _fwalk() lock each
FILE before invoking the callback.  Note that of the three callbacks
passed to _fwalk(), two end up locking the FILE anyway, so maybe
this is the right direction anyway.

So, the lock hierarchy is then...interesting:
 * if you hold sfp_mutex, you can FLOCKFILE a FILE iff _flags == 0
 * if _flags != 0, you must lock sfp_mutex before zeroing it and
   FUNLOCKFILE and never touch the FILE again before unlocking
   sfp_mutex.
Given the assumption at top, I believe that's safe+correct.


The problem case for _fwalk() is _cleanup(), which currently and
explicitly 'cheats' by failing to lock FILE...but I suspect that's
a hold-over from when abort() called atexit() handlers, as it's
supposed to be async-signal-safe and therefore can't take locks.
abort() no longer does that: POSIX withdrew it because, well, it
can't be done safely with an async-signal-safe abort() without
making lots of stdio functions block all signals, which would lead
to torches and pitchforks.  This is presumably _also_ why _fwalk()
doesn't lock sfp_mutex when it 'obviously' should, so that's fixable
too!  Woot!

So yeah, maybe it does work to:
1) make __sfp() FLOCKFILE() the allocated FILE before unlocking sfp_mutex
2) update f{,d,mem,un}open() and open_*memstream() to match (1), unlocking
   in all paths when the FILE is safe to be accessed by _fwalk(), and
locking
   sfp_mutex around the zeroing of _flags.
3) make fclose() and freopen() also lock sfp_mutex around the zeroing of
_flags
   (should add an _frelease() to findfp.c that does this dance for (2) and
(3))
4) make _fwalk() take sfp_mutex, and maybe also a FILE* so the setting of
   __SIGN can be done under the lock?
5) decide how/whether to handle adjust the FLOCKFILE placement in the
_fwalk()
   tree: is the testing of the "is line-buffered" flag in lflush() safe
without
   a lock?  Mumble...

Now, back to that first assumption: if you're not willing to assume
it then the "is allocated" test needs to not use _flags but be some
other state change which can be relied on to not have false-positives,
but otherwise the other bits above still apply, I believe.  Would
be a major ABI change...and if we do that there's like 3 other
things we should do at the same time (merge __sfileext into FILE,
grow _file to an int, and maybe move the recursive mutex for
f{,un}lockfile() into FILE?)


Philip Guenther

Reply via email to