RE: [PATCH v2] sched: Optimize __calc_delta.

2021-03-05 Thread David Laight
> Hi Josh, Thanks for helping get this patch across the finish line. > Would you mind updating the commit message to point to > https://bugs.llvm.org/show_bug.cgi?id=20197? Is it worth an audit of all the asm() constraints and potentially changing all the "mr" to "r" for clang? The explicit

Re: [PATCH v2] sched: Optimize __calc_delta.

2021-03-04 Thread Josh Don
On Thu, Mar 4, 2021 at 9:34 AM Nick Desaulniers wrote: > > > Hi Josh, Thanks for helping get this patch across the finish line. > Would you mind updating the commit message to point to > https://bugs.llvm.org/show_bug.cgi?id=20197? Sure thing, just saw that it got marked as a dup. Peter, since

Re: [PATCH v2] sched: Optimize __calc_delta.

2021-03-04 Thread Sedat Dilek
On Thu, Mar 4, 2021 at 7:24 PM Sedat Dilek wrote: > > On Thu, Mar 4, 2021 at 6:34 PM 'Nick Desaulniers' via Clang Built > Linux wrote: > > > > On Wed, Mar 3, 2021 at 2:48 PM Josh Don wrote: > > > > > > From: Clement Courbet > > > > > > A significant portion of __calc_delta time is spent in the

Re: [PATCH v2] sched: Optimize __calc_delta.

2021-03-04 Thread Sedat Dilek
On Thu, Mar 4, 2021 at 6:34 PM 'Nick Desaulniers' via Clang Built Linux wrote: > > On Wed, Mar 3, 2021 at 2:48 PM Josh Don wrote: > > > > From: Clement Courbet > > > > A significant portion of __calc_delta time is spent in the loop > > shifting a u64 by 32 bits. Use `fls` instead of iterating.

Re: [PATCH v2] sched: Optimize __calc_delta.

2021-03-04 Thread Nick Desaulniers
On Wed, Mar 3, 2021 at 2:48 PM Josh Don wrote: > > From: Clement Courbet > > A significant portion of __calc_delta time is spent in the loop > shifting a u64 by 32 bits. Use `fls` instead of iterating. > > This is ~7x faster on benchmarks. > > The generic `fls` implementation (`generic_fls`) is

Re: [PATCH v2] sched: Optimize __calc_delta.

2021-03-04 Thread Peter Zijlstra
On Wed, Mar 03, 2021 at 02:46:53PM -0800, Josh Don wrote: > From: Clement Courbet > > A significant portion of __calc_delta time is spent in the loop > shifting a u64 by 32 bits. Use `fls` instead of iterating. > > This is ~7x faster on benchmarks. > > The generic `fls` implementation

[PATCH v2] sched: Optimize __calc_delta.

2021-03-03 Thread Josh Don
From: Clement Courbet A significant portion of __calc_delta time is spent in the loop shifting a u64 by 32 bits. Use `fls` instead of iterating. This is ~7x faster on benchmarks. The generic `fls` implementation (`generic_fls`) is still ~4x faster than the loop. Architectures that have a