Re: [v8-dev] V8 Turbofan Optimization: Modulo 2 vs. Bitwise AND

Jakob Kummerow Tue, 11 Mar 2025 06:42:31 -0700

You need either %PrepareFunctionForOptimization(my_mod); before you start
collecting unoptimized feedback (i.e. before the my_mod(2) call), or more
unoptimized calls until feedback collection kicks in on its own. And of
course you need a build that has disassembler support enabled.


--- Raw source ---
(n) {
 if (n  % 2 == 1)
   return true;
 return false;
}


--- Optimized code ---
optimization_id = 0
source_position = 340
kind = TURBOFAN_JS
name = my_mod
compiler = turbofan
address = 0x176e001401a1

Instructions (size = 176)
0x55a9e0000040     0  55                   push rbp
0x55a9e0000041     1  4889e5               REX.W movq rbp,rsp
0x55a9e0000044     4  56                   push rsi
0x55a9e0000045     5  57                   push rdi
0x55a9e0000046     6  50                   push rax
0x55a9e0000047     7  4883ec08             REX.W subq rsp,0x8
0x55a9e000004b     b  488975e0             REX.W movq [rbp-0x20],rsi
0x55a9e000004f     f  493b65a0             REX.W cmpq rsp,[r13-0x60]
(external value (StackGuard::address_of_jslimit()))
0x55a9e0000053    13  0f865e000000         jna 0x55a9e00000b7  <+0x77>
0x55a9e0000059    19  488b5518             REX.W movq rdx,[rbp+0x18]
0x55a9e000005d    1d  f6c201               testb rdx,0x1
0x55a9e0000060    20  0f857b000000         jnz 0x55a9e00000e1  <+0xa1>
0x55a9e0000066    26  488bca               REX.W movq rcx,rdx
0x55a9e0000069    29  d1f9                 sarl rcx, 1
0x55a9e000006b    2b  85d2                 testl rdx,rdx
0x55a9e000006d    2d  0f8c08000000         jl 0x55a9e000007b  <+0x3b>
0x55a9e0000073    33  83e101               andl rcx,0x1
0x55a9e0000076    36  e90f000000           jmp 0x55a9e000008a  <+0x4a>
0x55a9e000007b    3b  f7d9                 negl rcx
0x55a9e000007d    3d  83e101               andl rcx,0x1
0x55a9e0000080    40  85c9                 testl rcx,rcx
0x55a9e0000082    42  0f845d000000         jz 0x55a9e00000e5  <+0xa5>
0x55a9e0000088    48  f7d9                 negl rcx
0x55a9e000008a    4a  83f901               cmpl rcx,0x1
0x55a9e000008d    4d  0f841e000000         jz 0x55a9e00000b1  <+0x71>
0x55a9e0000093    53  498d4655             REX.W leaq rax,[r14+0x55]
0x55a9e0000097    57  488b4de8             REX.W movq rcx,[rbp-0x18]
0x55a9e000009b    5b  488be5               REX.W movq rsp,rbp
0x55a9e000009e    5e  5d                   pop rbp
0x55a9e000009f    5f  4883f902             REX.W cmpq rcx,0x2
0x55a9e00000a3    63  7f03                 jg 0x55a9e00000a8  <+0x68>
0x55a9e00000a5    65  c21000               ret 0x10
0x55a9e00000a8    68  415a                 pop r10
0x55a9e00000aa    6a  488d24cc             REX.W leaq rsp,[rsp+rcx*8]
0x55a9e00000ae    6e  4152                 push r10
0x55a9e00000b0    70  c3                   retl
0x55a9e00000b1    71  498d4671             REX.W leaq rax,[r14+0x71]
0x55a9e00000b5    75  ebe0                 jmp 0x55a9e0000097  <+0x57>
0x55a9e00000b7    77  ba40000000           movl rdx,0x40
0x55a9e00000bc    7c  52                   push rdx
0x55a9e00000bd    7d  48bb00405fc7a9550000 REX.W movq rbx,0x55a9c75f4000
0x55a9e00000c7    87  b801000000           movl rax,0x1
0x55a9e00000cc    8c  48bee51a1800f57d0000 REX.W movq rsi,0x7df500181ae5
   ;; object: 0x7df500181ae5 <NativeContext[302]>
0x55a9e00000d6    96  e825a246e8           call 0x55a9c846a300
 (CEntry_Return1_ArgvOnStack_NoBuiltinExit)    ;; near builtin entry
0x55a9e00000db    9b  e979ffffff           jmp 0x55a9e0000059  <+0x19>
0x55a9e00000e0    a0  90                   nop
0x55a9e00000e1    a1  41ff55d8             call [r13-0x28]
0x55a9e00000e5    a5  41ff55d8             call [r13-0x28]
0x55a9e00000e9    a9  41ff55e0             call [r13-0x20]
0x55a9e00000ed    ad  0f1f00               nop

Inlined functions (count = 0)

Deoptimization Input Data (deopt points = 3)
index  bytecode-offset    pc
    0                2    NA
    1                2    NA
    2               -1    9b

Safepoints (stack slots = 6, entries = 1, byte size = 16)
0x55a9e00000db     9b  slots (sp->fp): 100000  deopt      2 trampoline:
    a9

RelocInfo (size = 5)
0x55a9e00000ce  full embedded object  (0x7df500181ae5 <NativeContext[12e]>)
0x55a9e00000d7  near builtin entry

--- End code ---


On Tue, Mar 11, 2025 at 2:35 PM Sỹ Trần Dũng <[email protected]> wrote:

> I tried d8 with --allow-natives-syntax --turbofan --print-opt-code flags
> and following code but don't get any output.
>
> function my_mod(n) {
>   if (n  % 2 == 1)
>     return true;
>   return false;
> }
>
> my_mod(2);
> my_mod(1);
> my_mod(3);
>
> %OptimizeFunctionOnNextCall(my_mod);
>
> my_mod(100)
>
> I'm not sure which flags to use here.
> On Tuesday, March 11, 2025 at 5:35:11 PM UTC+7 Jakob Kummerow wrote:
>
>> Why don't you test it and find out yourself?
>>
>>
>> On Tue, Mar 11, 2025 at 10:20 AM Sỹ Trần Dũng <[email protected]> wrote:
>>
>>> I have a question regarding V8's compiler optimization, specifically
>>> concerning the modulo 2 operation. In compilers like GCC and Clang, it's
>>> common to see the operation n % 2 optimized to a bitwise AND (n & 1) or a
>>> bit check instruction, as these are generally more efficient.
>>>
>>> I've been examining the bytecode generated by V8, and I've observed that
>>> a modulo instruction is used for n % 2.
>>>
>>> [generated bytecode for function: my_mod (0x3de244c5b401
>>> <SharedFunctionInfo my_mod>)]
>>> Bytecode length: 17
>>> Parameter count 2
>>> Register count 1
>>> Frame size 8
>>>    23 S> 0x32c69b60dc80 @    0 : 0b 03             Ldar a0
>>>    29 E> 0x32c69b60dc82 @    2 : 4b 02 00          ModSmi [2], [0]
>>>          0x32c69b60dc85 @    5 : c9                Star0
>>>          0x32c69b60dc86 @    6 : 0d 01             LdaSmi [1]
>>>    33 E> 0x32c69b60dc88 @    8 : 6f f9 01          TestEqual r0, [1]
>>>          0x32c69b60dc8b @   11 : 9e 04             JumpIfFalse [4]
>>> (0x32c69b60dc8f @ 15)
>>>    45 S> 0x32c69b60dc8d @   13 : 11                LdaTrue
>>>    57 S> 0x32c69b60dc8e @   14 : ae                Return
>>>    64 S> 0x32c69b60dc8f @   15 : 12                LdaFalse
>>>    77 S> 0x32c69b60dc90 @   16 : ae                Return
>>>
>>> I'm curious if this behavior changes when the code is "heated" and
>>> optimized by Turbofan.
>>>
>>> Could someone please tell whether Turbofan performs this particular
>>> optimization?
>>>
>>> Thank you for your time and expertise.
>>>
>>> --
>>
>>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/v8-dev/CAKSzg3Rsj97V%2BecDqs7CLVdA3dEuRd4qWvtUuFRTKgux5qQ0zA%40mail.gmail.com.

Re: [v8-dev] V8 Turbofan Optimization: Modulo 2 vs. Bitwise AND

Reply via email to