Re: [PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case

2024-06-03 Thread Jan Beulich
On 01.06.2024 03:47, Andrew Cooper wrote:
> On 28/05/2024 2:12 pm, Jan Beulich wrote:
>> On 28.05.2024 14:30, Andrew Cooper wrote:
>>> On 27/05/2024 2:37 pm, Jan Beulich wrote:
 On 27.05.2024 15:27, Jan Beulich wrote:
> On 24.05.2024 22:03, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/bitops.h
>> +++ b/xen/arch/x86/include/asm/bitops.h
>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>>  
>>  static always_inline unsigned int arch_ffs(unsigned int x)
>>  {
>> -int r;
>> +unsigned int r;
>> +
>> +if ( __builtin_constant_p(x > 0) && x > 0 )
>> +{
>> +/* Safe, when the compiler knows that x is nonzero. */
>> +asm ( "bsf %[val], %[res]"
>> +  : [res] "=r" (r)
>> +  : [val] "rm" (x) );
>> +}
> In patch 11 relevant things are all in a single patch, making it easier
> to spot that this is dead code: The sole caller already has a
> __builtin_constant_p(), hence I don't see how the one here could ever
> return true. With that the respective part of the description is then
> questionable, too, I'm afraid: Where did you observe any actual effect
> from this? Or if you did - what am I missing?
 Hmm, thinking about it: I suppose that's why you have
 __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
 I'm (positively) surprised that the former may return true when the latter
 doesn't.
>>> So was I, but this recommendation came straight from the GCC mailing
>>> list.  And it really does work, even back in obsolete versions of GCC.
>>>
>>> __builtin_constant_p() operates on an expression not a value, and is
>>> documented as such.
>> Of course.
>>
  Nevertheless I'm inclined to think this deserves a brief comment.
>>> There is a comment, and it's even visible in the snippet.
>> The comment is about the asm(); it is neither placed to clearly relate
>> to __builtin_constant_p(), nor is it saying anything about this specific
>> property of it. You said you were equally surprised; don't you think
>> that when both of us are surprised, a specific (even if brief) comment
>> is warranted?
> 
> Spell it out for me like I'm an idiot.
> 
> Because I'm looking at the patch I submitted, and at your request for "a
> brief comment", and I still have no idea what you think is wrong at the
> moment.
> 
> I'm also not included to write a comment saying "go and read the GCC
> manual more carefully".
> 
>>
 As an aside, to better match the comment inside the if()'s body, how about

 if ( __builtin_constant_p(!!x) && x )

 ? That also may make a little more clear that this isn't just a style
 choice, but actually needed for the intended purpose.
>>> I am not changing the logic.
>>>
>>> Apart from anything else, your suggestion is trivially buggy.  I care
>>> about whether the RHS collapses to a constant, and the only way of doing
>>> that correctly is asking the compiler about the *exact* expression. 
>>> Asking about some other expression which you hope - but do not know -
>>> that the compiler will treat equivalently is bogus.  It would be
>>> strictly better to only take the else clause, than to have both halves
>>> emitted.
>>>
>>> This is the form I've tested extensively.  It's also the clearest form
>>> IMO.  You can experiment with alternative forms when we're not staring
>>> down code freeze of 4.19.
>> "Clearest form" is almost always a matter of taste. To me, comparing
>> unsigned values with > or < against 0 is generally at least suspicious.
>> Using != is typically better (again: imo), and simply omitting the != 0
>> then is shorter with no difference in effect. Except in peculiar cases
>> like this one, where indeed it took me some time to figure why the
>> comparison operator may not be omitted.
>>
>> All that said: I'm not going to insist on any change; the R-b previously
>> offered still stands. I would highly appreciate though if the (further)
>> comment asked for could be added.
>>
>> What I definitely dislike here is you - not for the first time - turning
>> down remarks because a change of yours is late.
> 
> Actually it's not to do with the release.  I'd reject it at any point
> because it's an unreasonable request to make; to me, or to anyone else.
> 
> It would be a matter of taste (which again you have a singular view on),
> if it wasn't for the fact that what you actually said was:
> 
> "I don't like it, and you should discard all the careful analysis you
> did because here's a form I prefer, that I haven't tested concerning a
> behaviour I didn't even realise until this email."

Just to clarify: Long before this reply of yours I understood and admitted
my mistake. A more clear / well placed comment (see further up) might have
avoided that. Still - thanks for extending the comment in what you have
committed.

> and even if it wasn't a buggy suggestion to begin with, 

Re: [PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case

2024-05-31 Thread Andrew Cooper
On 28/05/2024 2:12 pm, Jan Beulich wrote:
> On 28.05.2024 14:30, Andrew Cooper wrote:
>> On 27/05/2024 2:37 pm, Jan Beulich wrote:
>>> On 27.05.2024 15:27, Jan Beulich wrote:
 On 24.05.2024 22:03, Andrew Cooper wrote:
> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>  
>  static always_inline unsigned int arch_ffs(unsigned int x)
>  {
> -int r;
> +unsigned int r;
> +
> +if ( __builtin_constant_p(x > 0) && x > 0 )
> +{
> +/* Safe, when the compiler knows that x is nonzero. */
> +asm ( "bsf %[val], %[res]"
> +  : [res] "=r" (r)
> +  : [val] "rm" (x) );
> +}
 In patch 11 relevant things are all in a single patch, making it easier
 to spot that this is dead code: The sole caller already has a
 __builtin_constant_p(), hence I don't see how the one here could ever
 return true. With that the respective part of the description is then
 questionable, too, I'm afraid: Where did you observe any actual effect
 from this? Or if you did - what am I missing?
>>> Hmm, thinking about it: I suppose that's why you have
>>> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
>>> I'm (positively) surprised that the former may return true when the latter
>>> doesn't.
>> So was I, but this recommendation came straight from the GCC mailing
>> list.  And it really does work, even back in obsolete versions of GCC.
>>
>> __builtin_constant_p() operates on an expression not a value, and is
>> documented as such.
> Of course.
>
>>>  Nevertheless I'm inclined to think this deserves a brief comment.
>> There is a comment, and it's even visible in the snippet.
> The comment is about the asm(); it is neither placed to clearly relate
> to __builtin_constant_p(), nor is it saying anything about this specific
> property of it. You said you were equally surprised; don't you think
> that when both of us are surprised, a specific (even if brief) comment
> is warranted?

Spell it out for me like I'm an idiot.

Because I'm looking at the patch I submitted, and at your request for "a
brief comment", and I still have no idea what you think is wrong at the
moment.

I'm also not included to write a comment saying "go and read the GCC
manual more carefully".

>
>>> As an aside, to better match the comment inside the if()'s body, how about
>>>
>>> if ( __builtin_constant_p(!!x) && x )
>>>
>>> ? That also may make a little more clear that this isn't just a style
>>> choice, but actually needed for the intended purpose.
>> I am not changing the logic.
>>
>> Apart from anything else, your suggestion is trivially buggy.  I care
>> about whether the RHS collapses to a constant, and the only way of doing
>> that correctly is asking the compiler about the *exact* expression. 
>> Asking about some other expression which you hope - but do not know -
>> that the compiler will treat equivalently is bogus.  It would be
>> strictly better to only take the else clause, than to have both halves
>> emitted.
>>
>> This is the form I've tested extensively.  It's also the clearest form
>> IMO.  You can experiment with alternative forms when we're not staring
>> down code freeze of 4.19.
> "Clearest form" is almost always a matter of taste. To me, comparing
> unsigned values with > or < against 0 is generally at least suspicious.
> Using != is typically better (again: imo), and simply omitting the != 0
> then is shorter with no difference in effect. Except in peculiar cases
> like this one, where indeed it took me some time to figure why the
> comparison operator may not be omitted.
>
> All that said: I'm not going to insist on any change; the R-b previously
> offered still stands. I would highly appreciate though if the (further)
> comment asked for could be added.
>
> What I definitely dislike here is you - not for the first time - turning
> down remarks because a change of yours is late.

Actually it's not to do with the release.  I'd reject it at any point
because it's an unreasonable request to make; to me, or to anyone else.

It would be a matter of taste (which again you have a singular view on),
if it wasn't for the fact that what you actually said was:

"I don't like it, and you should discard all the careful analysis you
did because here's a form I prefer, that I haven't tested concerning a
behaviour I didn't even realise until this email."

and even if it wasn't a buggy suggestion to begin with, it's still toxic
maintainer feedback.


Frankly, I'd have more time to review other peoples patches if I wasn't
wasting all of my time on premium grade manure like this, while trying
to help Oleksii who's had it far worse this release trying to clean up
droppings of maintainers-past.

~Andrew



Re: [PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case

2024-05-28 Thread Jan Beulich
On 28.05.2024 14:30, Andrew Cooper wrote:
> On 27/05/2024 2:37 pm, Jan Beulich wrote:
>> On 27.05.2024 15:27, Jan Beulich wrote:
>>> On 24.05.2024 22:03, Andrew Cooper wrote:
 --- a/xen/arch/x86/include/asm/bitops.h
 +++ b/xen/arch/x86/include/asm/bitops.h
 @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
  
  static always_inline unsigned int arch_ffs(unsigned int x)
  {
 -int r;
 +unsigned int r;
 +
 +if ( __builtin_constant_p(x > 0) && x > 0 )
 +{
 +/* Safe, when the compiler knows that x is nonzero. */
 +asm ( "bsf %[val], %[res]"
 +  : [res] "=r" (r)
 +  : [val] "rm" (x) );
 +}
>>> In patch 11 relevant things are all in a single patch, making it easier
>>> to spot that this is dead code: The sole caller already has a
>>> __builtin_constant_p(), hence I don't see how the one here could ever
>>> return true. With that the respective part of the description is then
>>> questionable, too, I'm afraid: Where did you observe any actual effect
>>> from this? Or if you did - what am I missing?
>> Hmm, thinking about it: I suppose that's why you have
>> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
>> I'm (positively) surprised that the former may return true when the latter
>> doesn't.
> 
> So was I, but this recommendation came straight from the GCC mailing
> list.  And it really does work, even back in obsolete versions of GCC.
> 
> __builtin_constant_p() operates on an expression not a value, and is
> documented as such.

Of course.

>>  Nevertheless I'm inclined to think this deserves a brief comment.
> 
> There is a comment, and it's even visible in the snippet.

The comment is about the asm(); it is neither placed to clearly relate
to __builtin_constant_p(), nor is it saying anything about this specific
property of it. You said you were equally surprised; don't you think
that when both of us are surprised, a specific (even if brief) comment
is warranted?

>> As an aside, to better match the comment inside the if()'s body, how about
>>
>> if ( __builtin_constant_p(!!x) && x )
>>
>> ? That also may make a little more clear that this isn't just a style
>> choice, but actually needed for the intended purpose.
> 
> I am not changing the logic.
> 
> Apart from anything else, your suggestion is trivially buggy.  I care
> about whether the RHS collapses to a constant, and the only way of doing
> that correctly is asking the compiler about the *exact* expression. 
> Asking about some other expression which you hope - but do not know -
> that the compiler will treat equivalently is bogus.  It would be
> strictly better to only take the else clause, than to have both halves
> emitted.
> 
> This is the form I've tested extensively.  It's also the clearest form
> IMO.  You can experiment with alternative forms when we're not staring
> down code freeze of 4.19.

"Clearest form" is almost always a matter of taste. To me, comparing
unsigned values with > or < against 0 is generally at least suspicious.
Using != is typically better (again: imo), and simply omitting the != 0
then is shorter with no difference in effect. Except in peculiar cases
like this one, where indeed it took me some time to figure why the
comparison operator may not be omitted.

All that said: I'm not going to insist on any change; the R-b previously
offered still stands. I would highly appreciate though if the (further)
comment asked for could be added.

What I definitely dislike here is you - not for the first time - turning
down remarks because a change of yours is late. This feels even more so
bad when considering that I'm typically replying to your patches with
pretty little turnaround. Whereas various of mine, pending in part for
years, do not seem to deserve any review comments at all. Unlike before,
where it was "only" improvements or feature additions, meanwhile even
bug fixes are left sit like that. If I may be blunt: This may not work
this way for much longer. At some point I will need to artificially
delay reviews, making them dependent on my own work also being allowed
to make progress. I question though whether that would be in everyone's
interest.

Jan



Re: [PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case

2024-05-28 Thread Andrew Cooper
On 27/05/2024 2:37 pm, Jan Beulich wrote:
> On 27.05.2024 15:27, Jan Beulich wrote:
>> On 24.05.2024 22:03, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/include/asm/bitops.h
>>> +++ b/xen/arch/x86/include/asm/bitops.h
>>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>>>  
>>>  static always_inline unsigned int arch_ffs(unsigned int x)
>>>  {
>>> -int r;
>>> +unsigned int r;
>>> +
>>> +if ( __builtin_constant_p(x > 0) && x > 0 )
>>> +{
>>> +/* Safe, when the compiler knows that x is nonzero. */
>>> +asm ( "bsf %[val], %[res]"
>>> +  : [res] "=r" (r)
>>> +  : [val] "rm" (x) );
>>> +}
>> In patch 11 relevant things are all in a single patch, making it easier
>> to spot that this is dead code: The sole caller already has a
>> __builtin_constant_p(), hence I don't see how the one here could ever
>> return true. With that the respective part of the description is then
>> questionable, too, I'm afraid: Where did you observe any actual effect
>> from this? Or if you did - what am I missing?
> Hmm, thinking about it: I suppose that's why you have
> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
> I'm (positively) surprised that the former may return true when the latter
> doesn't.

So was I, but this recommendation came straight from the GCC mailing
list.  And it really does work, even back in obsolete versions of GCC.

__builtin_constant_p() operates on an expression not a value, and is
documented as such.


>  Nevertheless I'm inclined to think this deserves a brief comment.

There is a comment, and it's even visible in the snippet.

> As an aside, to better match the comment inside the if()'s body, how about
>
> if ( __builtin_constant_p(!!x) && x )
>
> ? That also may make a little more clear that this isn't just a style
> choice, but actually needed for the intended purpose.

I am not changing the logic.

Apart from anything else, your suggestion is trivially buggy.  I care
about whether the RHS collapses to a constant, and the only way of doing
that correctly is asking the compiler about the *exact* expression. 
Asking about some other expression which you hope - but do not know -
that the compiler will treat equivalently is bogus.  It would be
strictly better to only take the else clause, than to have both halves
emitted.

This is the form I've tested extensively.  It's also the clearest form
IMO.  You can experiment with alternative forms when we're not staring
down code freeze of 4.19.

~Andrew



Re: [PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case

2024-05-27 Thread Jan Beulich
On 27.05.2024 15:27, Jan Beulich wrote:
> On 24.05.2024 22:03, Andrew Cooper wrote:
>> --- a/xen/arch/x86/include/asm/bitops.h
>> +++ b/xen/arch/x86/include/asm/bitops.h
>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>>  
>>  static always_inline unsigned int arch_ffs(unsigned int x)
>>  {
>> -int r;
>> +unsigned int r;
>> +
>> +if ( __builtin_constant_p(x > 0) && x > 0 )
>> +{
>> +/* Safe, when the compiler knows that x is nonzero. */
>> +asm ( "bsf %[val], %[res]"
>> +  : [res] "=r" (r)
>> +  : [val] "rm" (x) );
>> +}
> 
> In patch 11 relevant things are all in a single patch, making it easier
> to spot that this is dead code: The sole caller already has a
> __builtin_constant_p(), hence I don't see how the one here could ever
> return true. With that the respective part of the description is then
> questionable, too, I'm afraid: Where did you observe any actual effect
> from this? Or if you did - what am I missing?

Hmm, thinking about it: I suppose that's why you have
__builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit
I'm (positively) surprised that the former may return true when the latter
doesn't. Nevertheless I'm inclined to think this deserves a brief comment.

As an aside, to better match the comment inside the if()'s body, how about

if ( __builtin_constant_p(!!x) && x )

? That also may make a little more clear that this isn't just a style
choice, but actually needed for the intended purpose.

Jan



Re: [PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case

2024-05-27 Thread Jan Beulich
On 24.05.2024 22:03, Andrew Cooper wrote:
> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>  
>  static always_inline unsigned int arch_ffs(unsigned int x)
>  {
> -int r;
> +unsigned int r;
> +
> +if ( __builtin_constant_p(x > 0) && x > 0 )
> +{
> +/* Safe, when the compiler knows that x is nonzero. */
> +asm ( "bsf %[val], %[res]"
> +  : [res] "=r" (r)
> +  : [val] "rm" (x) );
> +}

In patch 11 relevant things are all in a single patch, making it easier
to spot that this is dead code: The sole caller already has a
__builtin_constant_p(), hence I don't see how the one here could ever
return true. With that the respective part of the description is then
questionable, too, I'm afraid: Where did you observe any actual effect
from this? Or if you did - what am I missing?

Jan



Re: [PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case

2024-05-27 Thread Jan Beulich
On 24.05.2024 22:03, Andrew Cooper wrote:
> The asm in arch_ffs() is safe but inefficient.
> 
> CMOV would be an improvement over a conditional branch, but for 64bit CPUs
> both Intel and AMD have provided enough details about the behaviour for a zero
> input.  It is safe to pre-load the destination register with -1 and drop the
> conditional logic.
> 
> However, it is common to find ffs() in a context where the optimiser knows
> that x in nonzero even if it the value isn't known precisely, and in that case
> it's safe to drop the preload of -1 too.
> 
> There are only a handful of uses of ffs() in the x86 build, and all of them
> improve as a result of this:
> 
>   add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31)
>   Function old new   delta
>   mask_write   114 107  -7
>   xmem_pool_alloc 10631039 -24
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 
with one suggestion:

> --- a/xen/arch/x86/include/asm/bitops.h
> +++ b/xen/arch/x86/include/asm/bitops.h
> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
>  
>  static always_inline unsigned int arch_ffs(unsigned int x)
>  {
> -int r;
> +unsigned int r;
> +
> +if ( __builtin_constant_p(x > 0) && x > 0 )

__builtin_constant_p(x) surely will do. In fact even the other "> 0" could
in principle be left out here.

Jan



[PATCH v2 07/13] x86/bitops: Improve arch_ffs() in the general case

2024-05-24 Thread Andrew Cooper
The asm in arch_ffs() is safe but inefficient.

CMOV would be an improvement over a conditional branch, but for 64bit CPUs
both Intel and AMD have provided enough details about the behaviour for a zero
input.  It is safe to pre-load the destination register with -1 and drop the
conditional logic.

However, it is common to find ffs() in a context where the optimiser knows
that x in nonzero even if it the value isn't known precisely, and in that case
it's safe to drop the preload of -1 too.

There are only a handful of uses of ffs() in the x86 build, and all of them
improve as a result of this:

  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31)
  Function old new   delta
  mask_write   114 107  -7
  xmem_pool_alloc 10631039 -24

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Oleksii Kurochko 
CC: Shawn Anastasio 
CC: consult...@bugseng.com 
CC: Simone Ballarin 
CC: Federico Serafini 
CC: Nicola Vetrini 

v2:
 * New.
 * Use __builtin_constant_p(x > 0) to optimise better.
---
 xen/arch/x86/include/asm/bitops.h | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/include/asm/bitops.h 
b/xen/arch/x86/include/asm/bitops.h
index 122767fc0d10..1d7aea6065ef 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
 
 static always_inline unsigned int arch_ffs(unsigned int x)
 {
-int r;
+unsigned int r;
+
+if ( __builtin_constant_p(x > 0) && x > 0 )
+{
+/* Safe, when the compiler knows that x is nonzero. */
+asm ( "bsf %[val], %[res]"
+  : [res] "=r" (r)
+  : [val] "rm" (x) );
+}
+else
+{
+/*
+ * The AMD manual states that BSF won't modify the destination
+ * register if x=0.  The Intel manual states that the result is
+ * undefined, but the architects have said that the register is
+ * written back with it's old value (zero extended as normal).
+ */
+asm ( "bsf %[val], %[res]"
+  : [res] "=r" (r)
+  : [val] "rm" (x), "[res]" (-1) );
+}
 
-asm ( "bsf %1,%0\n\t"
-  "jnz 1f\n\t"
-  "mov $-1,%0\n"
-  "1:" : "=r" (r) : "rm" (x));
 return r + 1;
 }
 #define arch_ffs arch_ffs
-- 
2.30.2