> Date: Thu, 14 Jun 2012 14:13:25 +0200
> From: Ariane van der Steldt <ari...@stack.nl>
> Cc: a...@caoua.org, tech@openbsd.org
> Content-Disposition: inline
> X-CNFS-Analysis: v=2.0 cv=BoYqN/r5 c=1 sm=0 a=GztW+Mr2qRhXIB90GGmOig==:17
>       a=wom5GMh1gUkA:10 a=90mjwyQIHKgA:10 a=kj9zAlcOel0A:10
>       a=xqWC_Br6kY4A:10 a=dTsEX9JhAAAA:8 a=9HFTjYS7m31va2ZAaAYA:9
>       a=CjuIK1q_8ugA:10 a=JDJNggdDa0UA:10 a=GztW+Mr2qRhXIB90GGmOig==:117
> X-Virus-Scanned: by XS4ALL Virus Scanner
> Envelope-To: mark.kette...@xs4all.nl
> 
> On Thu, Jun 14, 2012 at 12:19:24PM +0200, Mark Kettenis wrote:
> > > Date: Thu, 14 Jun 2012 08:58:25 +0200
> > > From: Alexandre Ratchov <a...@caoua.org>
> > > 
> > > On Thu, Jun 14, 2012 at 03:48:12AM +0200, Ariane van der Steldt wrote:
> > > > Hi,
> > > > 
> > > > This diff implements yielding in the vm system, with the intention to
> > > > make expensive system calls preempt every once in a while, to allow
> > > > userland system calls to progress.
> > > > 
> > > > It's a very conservative application at the moment, only interleaving
> > > > during memory unmapping.  If this works, we can start adding the
> > > > mechanic to other expensive system calls not just in uvm, but anywhere
> > > > the kernel is kept locked for long stretches of time.
> > > > 
> > > 
> > > FWIW, few years ago I did a lot of tests and munmap() was by far
> > > the longest kernel code path. IIRC the second was ffs w/ softdeps.
> > > 
> > > > Technically the diff is incomplete, since it does not consider the very
> > > > big problem of the biglock.  And that's because the biglock lacks a
> > > > __mp_lock_is_contended() call: if we hold the biglock and another cpu
> > > > wants the biglock, ideally it will inform us so we can sidestep and
> > > > temporarily grant access.
> > > 
> > > I've read the diff, but I still don't understand this part. Why
> > > wouldn't the naive approch work for MP as well? I mean, on UP
> > > systems we do:
> > > 
> > >   for (;;) {
> > >           one_cycle_of_heavy_stuff();
> > >           preemp();
> > >   }
> > > 
> > > and preemp() would check for a flag set in the clock interrupt and
> > > possibly call mi_switch() to run another process. On MP systems,
> > > mi_switch() would release the kernel lock, wouldn't it? If so,
> > > other processes can invoke syscalls. In this case I don't see the
> > > difference between MP and UP.
> > 
> > Indeed.  Except that we probably should yield() instead of preempt()
> > since this is a voluntary context switch.  At least that is what the
> > pool subsystem does.
> > 
> > The hard bit is to determine how much work to do in one cycle.  Should
> > be enough work to still spend a significant fraction of a timeslice
> > otherwise unmapping stuff becomes too slow.
> 
> Yes, the amount of work may not be too small.  I've considered this and
> decided to solve it by calling the preempt conditionally: only context
> switch if:
> - the process is told a context switch needs to happen
> - we're the reaper and something else on this cpu wants to run
> This basically keeps us from switching unless there's a good reason,
> like curproc having used up its entire time slice or a higher priority
> process became runnable (the former happening only every 100ms, the
> latter happening only a limited amount of time: worst case every process
> will become blocked on curproc and curproc completes, unblocking them
> all).

Ah yes, I see you check SPCF_SHOULDYIELD flag just like uiomove(9)
already does.  That makes sense.  And that code path does indeed
preempt(), so copying that makes sense as well.

> If neither is true, we don't preempt (thus not calling mpswitch(), not
> releasing the kernel lock etc etc).  For this reason it doesn't work
> well on SMP.  Unfortunately the above conditions may be false if for ex
> 2 processes are spread out across 2 cpus: the cpu with the biglock will
> not mark the proc as need-to-switch (because it's the only process on
> the runqueue, iirc there's an optimization for that).

I think you remember incorrectly.  As far as I can tell
SPCF_SHOULDYIELD is always set when a process overruns its allocated
timeslice.

I think it is worth to get this diff tested without worrying about
biglock contention first.

Reply via email to