> Date: Thu, 14 Jun 2012 14:13:25 +0200 > From: Ariane van der Steldt <ari...@stack.nl> > Cc: a...@caoua.org, tech@openbsd.org > Content-Disposition: inline > X-CNFS-Analysis: v=2.0 cv=BoYqN/r5 c=1 sm=0 a=GztW+Mr2qRhXIB90GGmOig==:17 > a=wom5GMh1gUkA:10 a=90mjwyQIHKgA:10 a=kj9zAlcOel0A:10 > a=xqWC_Br6kY4A:10 a=dTsEX9JhAAAA:8 a=9HFTjYS7m31va2ZAaAYA:9 > a=CjuIK1q_8ugA:10 a=JDJNggdDa0UA:10 a=GztW+Mr2qRhXIB90GGmOig==:117 > X-Virus-Scanned: by XS4ALL Virus Scanner > Envelope-To: mark.kette...@xs4all.nl > > On Thu, Jun 14, 2012 at 12:19:24PM +0200, Mark Kettenis wrote: > > > Date: Thu, 14 Jun 2012 08:58:25 +0200 > > > From: Alexandre Ratchov <a...@caoua.org> > > > > > > On Thu, Jun 14, 2012 at 03:48:12AM +0200, Ariane van der Steldt wrote: > > > > Hi, > > > > > > > > This diff implements yielding in the vm system, with the intention to > > > > make expensive system calls preempt every once in a while, to allow > > > > userland system calls to progress. > > > > > > > > It's a very conservative application at the moment, only interleaving > > > > during memory unmapping. If this works, we can start adding the > > > > mechanic to other expensive system calls not just in uvm, but anywhere > > > > the kernel is kept locked for long stretches of time. > > > > > > > > > > FWIW, few years ago I did a lot of tests and munmap() was by far > > > the longest kernel code path. IIRC the second was ffs w/ softdeps. > > > > > > > Technically the diff is incomplete, since it does not consider the very > > > > big problem of the biglock. And that's because the biglock lacks a > > > > __mp_lock_is_contended() call: if we hold the biglock and another cpu > > > > wants the biglock, ideally it will inform us so we can sidestep and > > > > temporarily grant access. > > > > > > I've read the diff, but I still don't understand this part. Why > > > wouldn't the naive approch work for MP as well? I mean, on UP > > > systems we do: > > > > > > for (;;) { > > > one_cycle_of_heavy_stuff(); > > > preemp(); > > > } > > > > > > and preemp() would check for a flag set in the clock interrupt and > > > possibly call mi_switch() to run another process. On MP systems, > > > mi_switch() would release the kernel lock, wouldn't it? If so, > > > other processes can invoke syscalls. In this case I don't see the > > > difference between MP and UP. > > > > Indeed. Except that we probably should yield() instead of preempt() > > since this is a voluntary context switch. At least that is what the > > pool subsystem does. > > > > The hard bit is to determine how much work to do in one cycle. Should > > be enough work to still spend a significant fraction of a timeslice > > otherwise unmapping stuff becomes too slow. > > Yes, the amount of work may not be too small. I've considered this and > decided to solve it by calling the preempt conditionally: only context > switch if: > - the process is told a context switch needs to happen > - we're the reaper and something else on this cpu wants to run > This basically keeps us from switching unless there's a good reason, > like curproc having used up its entire time slice or a higher priority > process became runnable (the former happening only every 100ms, the > latter happening only a limited amount of time: worst case every process > will become blocked on curproc and curproc completes, unblocking them > all).
Ah yes, I see you check SPCF_SHOULDYIELD flag just like uiomove(9) already does. That makes sense. And that code path does indeed preempt(), so copying that makes sense as well. > If neither is true, we don't preempt (thus not calling mpswitch(), not > releasing the kernel lock etc etc). For this reason it doesn't work > well on SMP. Unfortunately the above conditions may be false if for ex > 2 processes are spread out across 2 cpus: the cpu with the biglock will > not mark the proc as need-to-switch (because it's the only process on > the runqueue, iirc there's an optimization for that). I think you remember incorrectly. As far as I can tell SPCF_SHOULDYIELD is always set when a process overruns its allocated timeslice. I think it is worth to get this diff tested without worrying about biglock contention first.