On Tue, Dec 18, 2018 at 03:39:43PM -0600, Ian Sutton wrote:
> On Mon, Aug 14, 2017 at 3:07 PM Martin Pieuchot <m...@openbsd.org> wrote:
> >
> > I'd like to improve the fairness of the scheduler, with the goal of
> > mitigating userland starvations.  For that the kernel needs to have
> > a better understanding of the amount of executed time per task.
> >
> > The smallest interval currently usable on all our architectures for
> > such accounting is a tick.  With the current HZ value of 100, this
> > smallest interval is 10ms.  I'd like to bump this value to 1000.
> >
> > The diff below intentionally bump other `hz' value to keep current
> > ratios.  We certainly want to call schedclock(), or a similar time
> > accounting function, at a higher frequency than 16 Hz.  However this
> > will be part of a later diff.
> >
> > I'd be really interested in test reports.  mlarkin@ raised a good
> > question: is your battery lifetime shorter with this diff?
> >
> > Comments, oks?
> 
> I'd like to revisit this patch. It makes our armv7 platform more
> usable for what it is meant to do, i.e. be a microcontroller. I
> imagine on other platforms it would accrue similar benefits as well.
> 
> I've tested this patch and found delightfully proportional results.
> Currently, at HZ = 100, the minimum latency for a sleep calll from
> userspace is about 10ms:
> 
> https://ce.gl/baseline.jpg
> 
> After the patch, which bumps HZ from 100 --> 1000, we see a tenfold
> decrease in this latency:
> 
> https://ce.gl/with-mpi-hz-patch.jpg
> 
> This signal is generated with gpio(4) ioctl calls from userspace,
> e.g.: for(;;) { HI(pin); usleep(1); LO(pin(); usleep(1); }
> 
> I'd like to see more folks test and other devs to share their
> thoughts: What are the risks associated with bumping HZ globally?
> Drawbacks? Reasons for hesitation?

In general I'd like to reduce wakeup latency as well.  Raising HZ is an
obvious route to achieving that.  But I think there are a couple things
that need to be addressed before it would be reasonable.  The things that
come to mind for me are:

 - A tick is a 32-bit signed integer on all platforms.  If HZ=100, we
   can represent at most ~248 days in ticks.  This is plenty.  If HZ=1000,
   we now only have ~24.8 days.  Some may disagree, but I don't think this
   is enough.

   One possible solution is to make ticks 64-bit.  This addresses the
   timeout length issue at a cost to 32-bit platforms that I cannot
   quantify without lots of testing: what is the overhead of using 64-bit
   arithmetic on a 32-bit machine for all timeouts?

   A compromise is to make ticks a long.  kettenis mentioned this
   possibility in a commit [1] some time back.  This would allow 64-bit
   platforms to raise HZ without crippling timeout ranges.  But then you
   have ticks of different sizes on different platforms, which could be a
   headache, I imagine.

   (maybe there are other solutions?)

 - How does an OpenBSD guest on vmd(8) behave when HZ=1000?  Multiple such
   guests on vmd(8)?  Such guests on other hypervisors?

 - The replies in this thread don't indicate any effect on battery life or
   power consumption but I find it hard to believe that raising HZ has no
   impact on such things.  Bumping HZ like this *must* increase CPU utilization.
   What is the cost in watt-hours?

 - Can smaller machines even handle HZ=1000?  Linux experimented with this
   over a decade ago and settled on a default HZ=250 for i386 [2].  I don't
   know how it all shook out, but my guess is that they didn't revert from
   1000 -> 250 for no reason at all.  Of course, FreeBSD went ahead with 1000
   on i386, so opinions differ.

 - How does this effect e.g. packet throughput on smaller machines?  I think
   bigger boxes on amd64 would be fine, but I wonder if throughput would take
   a noticeable hit on a smaller router.

And then... can we reduce wakeup latency in general without raising HZ?  Other
systems (e.g. DFly) have better wakeup latencies and still have HZ=100.  What
are they doing?  Can we borrow it?

--

Sorry for the length.  In short, you should be fine compiling custom kernels
for your controllers with HZ=1000; you shouldn't see any ill effects for that
use case.  But making it the default, even for select platforms, needs more
planning.

-Scott

[1] 
http://cvsweb.openbsd.org/src/sys/kern/kern_clock.c?rev=1.93&content-type=text/x-cvsweb-markup

[2] http://man7.org/linux/man-pages/man7/time.7.html

> > Index: conf/param.c
> > ===================================================================
> > RCS file: /cvs/src/sys/conf/param.c,v
> > retrieving revision 1.37
> > diff -u -p -r1.37 param.c
> > --- conf/param.c        6 May 2016 19:45:35 -0000       1.37
> > +++ conf/param.c        14 Aug 2017 17:03:23 -0000
> > @@ -76,7 +76,7 @@
> >  # define DST 0
> >  #endif
> >  #ifndef HZ
> > -#define        HZ 100
> > +#define        HZ 1000
> >  #endif
> >  int    hz = HZ;
> >  int    tick = 1000000 / HZ;
> > Index: kern/kern_clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_clock.c,v
> > retrieving revision 1.93
> > diff -u -p -r1.93 kern_clock.c
> > --- kern/kern_clock.c   22 Jul 2017 14:33:45 -0000      1.93
> > +++ kern/kern_clock.c   14 Aug 2017 19:50:49 -0000
> > @@ -406,12 +406,11 @@ statclock(struct clockframe *frame)
> >         if (p != NULL) {
> >                 p->p_cpticks++;
> >                 /*
> > -                * If no schedclock is provided, call it here at ~~12-25 Hz;
> > +                * If no schedclock is provided, call it here;
> >                  * ~~16 Hz is best
> >                  */
> >                 if (schedhz == 0) {
> > -                       if ((++curcpu()->ci_schedstate.spc_schedticks & 3) 
> > ==
> > -                           0)
> > +                       if ((spc->spc_schedticks & 0x3f) == 0)
> >                                 schedclock(p);
> >                 }
> >         }
> > Index: arch/amd64/isa/clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/amd64/isa/clock.c,v
> > retrieving revision 1.25
> > diff -u -p -r1.25 clock.c
> > --- arch/amd64/isa/clock.c      11 Aug 2017 21:18:11 -0000      1.25
> > +++ arch/amd64/isa/clock.c      14 Aug 2017 17:19:35 -0000
> > @@ -303,8 +303,8 @@ rtcdrain(void *v)
> >  void
> >  i8254_initclocks(void)
> >  {
> > -       stathz = 128;
> > -       profhz = 1024;
> > +       stathz = 1024;
> > +       profhz = 8192;
> >
> >         isa_intr_establish(NULL, 0, IST_PULSE, IPL_CLOCK, clockintr,
> >             0, "clock");
> > @@ -321,7 +321,7 @@ rtcstart(void)
> >  {
> >         static struct timeout rtcdrain_timeout;
> >
> > -       mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_128_Hz);
> > +       mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_1024_Hz);
> >         mc146818_write(NULL, MC_REGB, MC_REGB_24HR | MC_REGB_PIE);
> >
> >         /*
> > @@ -577,10 +577,10 @@ setstatclockrate(int arg)
> >         if (initclock_func == i8254_initclocks) {
> >                 if (arg == stathz)
> >                         mc146818_write(NULL, MC_REGA,
> > -                           MC_BASE_32_KHz | MC_RATE_128_Hz);
> > +                           MC_BASE_32_KHz | MC_RATE_1024_Hz);
> >                 else
> >                         mc146818_write(NULL, MC_REGA,
> > -                           MC_BASE_32_KHz | MC_RATE_1024_Hz);
> > +                           MC_BASE_32_KHz | MC_RATE_8192_Hz);
> >         }
> >  }
> >
> > Index: arch/armv7/omap/dmtimer.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/armv7/omap/dmtimer.c,v
> > retrieving revision 1.6
> > diff -u -p -r1.6 dmtimer.c
> > --- arch/armv7/omap/dmtimer.c   22 Jan 2015 14:33:01 -0000      1.6
> > +++ arch/armv7/omap/dmtimer.c   14 Aug 2017 17:16:01 -0000
> > @@ -296,8 +296,8 @@ dmtimer_cpu_initclocks()
> >  {
> >         struct dmtimer_softc    *sc = dmtimer_cd.cd_devs[1];
> >
> > -       stathz = 128;
> > -       profhz = 1024;
> > +       stathz = 1024;
> > +       profhz = 8192;
> >
> >         sc->sc_ticks_per_second = TIMER_FREQUENCY; /* 32768 */
> >
> > Index: arch/armv7/omap/gptimer.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/armv7/omap/gptimer.c,v
> > retrieving revision 1.4
> > diff -u -p -r1.4 gptimer.c
> > --- arch/armv7/omap/gptimer.c   20 Jun 2014 14:08:11 -0000      1.4
> > +++ arch/armv7/omap/gptimer.c   14 Aug 2017 17:15:44 -0000
> > @@ -283,8 +283,8 @@ void
> >  gptimer_cpu_initclocks()
> >  {
> >  //     u_int32_t now;
> > -       stathz = 128;
> > -       profhz = 1024;
> > +       stathz = 1024;
> > +       profhz = 8192;
> >
> >         ticks_per_second = TIMER_FREQUENCY;
> >
> > Index: arch/armv7/sunxi/sxitimer.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/armv7/sunxi/sxitimer.c,v
> > retrieving revision 1.10
> > diff -u -p -r1.10 sxitimer.c
> > --- arch/armv7/sunxi/sxitimer.c 21 Jan 2017 08:26:49 -0000      1.10
> > +++ arch/armv7/sunxi/sxitimer.c 14 Aug 2017 17:15:04 -0000
> > @@ -189,9 +189,8 @@ sxitimer_attach(struct device *parent, s
> >         bus_space_write_4(sxitimer_iot, sxitimer_ioh,
> >             TIMER_CTRL(STATTIMER), TIMER_OSC24M);
> >
> > -       /* 100/1000 or 128/1024 ? */
> > -       stathz = 128;
> > -       profhz = 1024;
> > +       stathz = 1024;
> > +       profhz = 8192;
> >         sxitimer_setstatclockrate(stathz);
> >
> >         ival = sxitimer_stat_tpi = freq / stathz;
> > Index: arch/i386/isa/clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/i386/isa/clock.c,v
> > retrieving revision 1.51
> > diff -u -p -r1.51 clock.c
> > --- arch/i386/isa/clock.c       25 Jan 2017 08:23:50 -0000      1.51
> > +++ arch/i386/isa/clock.c       14 Aug 2017 17:19:18 -0000
> > @@ -226,8 +226,8 @@ rtcintr(void *arg)
> >         if (stathz == 0) {
> >                 extern int psratio;
> >
> > -               stathz = 128;
> > -               profhz = 1024;
> > +               stathz = 1024;
> > +               profhz = 8192;
> >                 psratio = profhz / stathz;
> >         }
> >
> > @@ -448,7 +448,7 @@ rtcstart(void)
> >  {
> >         static struct timeout rtcdrain_timeout;
> >
> > -       mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_128_Hz);
> > +       mc146818_write(NULL, MC_REGA, MC_BASE_32_KHz | MC_RATE_1024_Hz);
> >         mc146818_write(NULL, MC_REGB, MC_REGB_24HR | MC_REGB_PIE);
> >
> >         /*
> > @@ -698,10 +698,10 @@ setstatclockrate(int arg)
> >         if (initclock_func == i8254_initclocks) {
> >                 if (arg == stathz)
> >                         mc146818_write(NULL, MC_REGA,
> > -                           MC_BASE_32_KHz | MC_RATE_128_Hz);
> > +                           MC_BASE_32_KHz | MC_RATE_1024_Hz);
> >                 else
> >                         mc146818_write(NULL, MC_REGA,
> > -                           MC_BASE_32_KHz | MC_RATE_1024_Hz);
> > +                           MC_BASE_32_KHz | MC_RATE_8192_Hz);
> >         }
> >  }
> >
> > Index: arch/loongson/dev/glx.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/loongson/dev/glx.c,v
> > retrieving revision 1.10
> > diff -u -p -r1.10 glx.c
> > --- arch/loongson/dev/glx.c     15 Aug 2015 22:25:22 -0000      1.10
> > +++ arch/loongson/dev/glx.c     14 Aug 2017 17:06:05 -0000
> > @@ -124,7 +124,7 @@ glx_init(pci_chipset_tag_t pc, pcitag_t
> >         /*
> >          * MFGPT runs on powers of two, adjust the hz value accordingly.
> >          */
> > -       stathz = hz = 128;
> > +       stathz = hz = 1024;
> >  }
> >
> >  uint64_t
> > Index: arch/macppc/macppc/clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/macppc/macppc/clock.c,v
> > retrieving revision 1.40
> > diff -u -p -r1.40 clock.c
> > --- arch/macppc/macppc/clock.c  13 Jun 2015 07:16:36 -0000      1.40
> > +++ arch/macppc/macppc/clock.c  14 Aug 2017 17:13:15 -0000
> > @@ -302,8 +302,8 @@ cpu_initclocks()
> >
> >         ticks_per_intr = ticks_per_sec / hz;
> >
> > -       stathz = 100;
> > -       profhz = 1000; /* must be a multiple of stathz */
> > +       stathz = 1000;
> > +       profhz = 10000; /* must be a multiple of stathz */
> >
> >         /* init secondary clock to stathz */
> >         statint = ticks_per_sec / stathz;
> > Index: arch/sparc64/sparc64/clock.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/sparc64/sparc64/clock.c,v
> > retrieving revision 1.59
> > diff -u -p -r1.59 clock.c
> > --- arch/sparc64/sparc64/clock.c        30 Apr 2017 16:45:45 -0000      1.59
> > +++ arch/sparc64/sparc64/clock.c        14 Aug 2017 20:00:26 -0000
> > @@ -647,8 +647,8 @@ cpu_initclocks(void)
> >         if (stathz == 0)
> >                 stathz = hz;
> >         if (1000000 % stathz) {
> > -               printf("cannot get %d Hz statclock; using 100 Hz\n", 
> > stathz);
> > -               stathz = 100;
> > +               printf("cannot get %d Hz statclock; using 1000 Hz\n", 
> > stathz);
> > +               stathz = 1000;
> >         }
> >
> >         profhz = stathz;                /* always */
> > @@ -665,7 +665,7 @@ cpu_initclocks(void)
> >         schedint.ih_clr = NULL;
> >         schedint.ih_arg = 0;
> >         schedint.ih_pending = 0;
> > -       schedhz = stathz/4;
> > +       schedhz = 16;
> >
> >         /*
> >          * Enable timers
> > @@ -867,7 +867,7 @@ statintr(cap)
> >         newint = statmin + r;
> >
> >         if (schedhz)
> > -               if ((++ci->ci_schedstate.spc_schedticks & 3) == 0)
> > +               if ((++ci->ci_schedstate.spc_schedticks & 0x3f) == 0)
> >                         send_softint(-1, PIL_SCHED, &schedint);
> >         stxa((vaddr_t)&timerreg_4u.t_timer[1].t_limit, ASI_NUCLEUS,
> >              tmr_ustolim(newint)|TMR_LIM_IEN|TMR_LIM_RELOAD);
> >
> 

Reply via email to