On Fri, 21 Jun 2019 21:54:18 -0700
Mike Larkin <mlar...@nested.page> wrote:

> On Fri, Jun 21, 2019 at 05:11:26PM -0300, Martin Pieuchot wrote:
> > On 06/06/19(Thu) 15:16, Martin Pieuchot wrote:
> > > On 02/06/19(Sun) 16:41, Martin Pieuchot wrote:
> > > > On 01/06/19(Sat) 18:55, Martin Pieuchot wrote:
> > > > > Diff below exists mainly for documentation and test purposes.  If
> > > > > you're not interested about how to break the scheduler internals in
> > > > > pieces, don't read further and go straight to testing!
> > > > > 
> > > > > - First change is to stop calling tsleep(9) at PUSER.  That makes
> > > > >   it clear that all "sleeping priorities" are smaller than PUSER.
> > > > >   That's important to understand for the diff below.  `p_priority'
> > > > >   is currently a placeholder for the "sleeping priority" and the
> > > > >   "runnqueue priority".  Both fields are separated by this diff.
> > > > > 
> > > > > - When a thread goes to sleep, the priority argument of tsleep(9) is
> > > > >   now recorded in `p_slpprio'.  This argument can be considered as 
> > > > > part
> > > > >   of the sleep queue.  Its purpose is to place the thread into a 
> > > > > higher
> > > > >   runqueue when awoken.
> > > > > 
> > > > > - Currently, for stopped threads, `p_priority' correspond to 
> > > > > `p_usrpri'. 
> > > > >   So setrunnable() has been untangled to place SSTOP and SSLEEP 
> > > > > threads
> > > > >   in the preferred queue without having to use `p_priority'.  Note 
> > > > > that
> > > > >   `p_usrpri' is still recalculated *after* having called 
> > > > > setrunqueue().
> > > > >   This is currently fine because setrunnable() is called with 
> > > > > SCHED_LOCK() 
> > > > >   but it will be racy when we'll split it.
> > > > > 
> > > > > - A new field, `p_runprio' has been introduced.  It should be 
> > > > > considered
> > > > >   as part of the per-CPU runqueues.  It indicates where a current 
> > > > > thread
> > > > >   is placed.
> > > > > 
> > > > > - `spc_curpriority' is now updated at every context-switch.  That 
> > > > > means
> > > > >    need_resched() won't be called after comparing an out-of-date 
> > > > > value.
> > > > >    At the same time, `p_usrpri' is initialized to the highest possible
> > > > >    value for idle threads.
> > > > > 
> > > > > - resched_proc() was calling need_resched() in the following 
> > > > > conditions:
> > > > >    - If the SONPROC thread has a higher priority that the current
> > > > >      running thread (itself).
> > > > >    - Twice in setrunnable() when we know that p_priority <= p_usrpri.
> > > > >    - If schedcpu() considered that a thread, after updating its prio,
> > > > >      should preempt the one running on the CPU pointed by `p_cpu'. 
> > > > > 
> > > > >   The diff below simplify all of that by calling need_resched() when:
> > > > >    - A thread is inserted in a CPU runqueue at a higher priority than
> > > > >      the one SONPROC.
> > > > >    - schedcpu() decides that a thread in SRUN state should preempt the
> > > > >      one SONPROC.
> > > > > 
> > > > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority"
> > > > >   of a thread are now updated while holding a per-thread mutex.  As a
> > > > >   result schedclock() and donice() no longer takes the SCHED_LOCK(),
> > > > >   and schedcpu() almost never take it.
> > > > > 
> > > > > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' 
> > > > > value
> > > > >   when displaying priorities.  This is helpful to understand what's
> > > > >   happening:
> > > > > 
> > > > > load averages:  0.99,  0.56,  0.25               
> > > > > two.lab.grenadille.net 23:42:10
> > > > > 70 threads: 68 idle, 2 on processor                                   
> > > > >  up  0:09
> > > > > CPU0:  0.0% user,  0.0% nice, 51.0% sys,  2.0% spin,  0.0% intr, 
> > > > > 47.1% idle
> > > > > CPU1:  2.0% user,  0.0% nice, 51.0% sys,  3.9% spin,  0.0% intr, 
> > > > > 43.1% idle
> > > > > Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M
> > > > > 
> > > > >   PID      TID PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU 
> > > > > COMMAND
> > > > > 81000   145101  72    0    0K 1664K sleep/1   bored     1:15 36.96% 
> > > > > softnet
> > > > > 47133   244097  73    0 2984K 4408K sleep/1   netio     1:06 35.06% 
> > > > > cvs 
> > > > > 64749   522184  66    0  176K  148K onproc/1  -         0:55 28.81% 
> > > > > nfsd
> > > > > 21615   602473 127    0    0K 1664K sleep/0   -         7:22  0.00% 
> > > > > idle0  
> > > > > 12413   606242 127    0    0K 1664K sleep/1   -         7:08  0.00% 
> > > > > idle1
> > > > > 85778   338258  50    0 4936K 7308K idle      select    0:10  0.00% 
> > > > > ssh  
> > > > > 22771   575513  50    0  176K  148K sleep/0   nfsd      0:02  0.00% 
> > > > > nfsd 
> > > > > ....
> > > > > 
> > > > > 
> > > > > - The removal of `p_priority' and the change that makes mi_switch()
> > > > >   always update `spc_curpriority' might introduce some changes in
> > > > >   behavior, especially with kernel threads that were not going through
> > > > >   tsleep(9).  We currently have some situations where the priority of
> > > > >   the running thread isn't correctly reflected.  This diff changes 
> > > > > that
> > > > >   which means we should be able to better understand where the 
> > > > > problems
> > > > >   are.
> > > > > 
> > > > > I'd be interested in comments/tests/reviews before continuing in this
> > > > > direction.  Note that at least part of this diff are required to split
> > > > > the accounting apart from the SCHED_LOCK() as well.
> > > > > 
> > > > > I'll also work on exporting scheduler statistics unless somebody wants
> > > > > to beat me :)
> > > > 
> > > > Updated diff to use IPL_SCHED and rebased to apply on top of -current 
> > > > :) 
> > > 
> > > Updated diff that fixes a pagefault reported by sthen@.
> > 
> > Rebased diff on top of -current.  I'm still looking for tests and
> > comments :)
> > 
> 
> I have been running this since the first version without any issues. I also
> have the "unlock more syscalls" diff in this tree as well as the "push the
> kernel lock lower on read and write" diff, and can say it's noticeably faster
> with these diffs in.
> 

Hi,

FWIW, this diff is quite good.

1) It solves quite serious bugs in need_resched().
2) Now nice value is respected for sndiod, ntpd. Previously, it was not.
3) Code is much simpler and easier to understand.
4) When a proc uses CPU, its prio gets lowered, and the runprio moves from 50 
and trending towards 127, which is correct, and as desired. roughly same 
behavior as before.
5) IDLE has lowest prio, which is logical, bonus in this diff.
6) Zerothread priority is lowered, which is roughly same behaviour as before.

One comment: all P_SYSTEM proc's are now at PUSER = 50, instead of 0, but that 
is by design to understand runtime behaviour, and if needed can be tweaked 
later.

A request: can you please remove the double check for SRUN in 
sched_bsd.c-->schedcpu()?

Thanks

Reply via email to