You need either %PrepareFunctionForOptimization(my_mod); before you start
collecting unoptimized feedback (i.e. before the my_mod(2) call), or more
unoptimized calls until feedback collection kicks in on its own. And of
course you need a build that has disassembler support enabled.
--- Raw source ---
(n) {
if (n % 2 == 1)
return true;
return false;
}
--- Optimized code ---
optimization_id = 0
source_position = 340
kind = TURBOFAN_JS
name = my_mod
compiler = turbofan
address = 0x176e001401a1
Instructions (size = 176)
0x55a9e0000040 0 55 push rbp
0x55a9e0000041 1 4889e5 REX.W movq rbp,rsp
0x55a9e0000044 4 56 push rsi
0x55a9e0000045 5 57 push rdi
0x55a9e0000046 6 50 push rax
0x55a9e0000047 7 4883ec08 REX.W subq rsp,0x8
0x55a9e000004b b 488975e0 REX.W movq [rbp-0x20],rsi
0x55a9e000004f f 493b65a0 REX.W cmpq rsp,[r13-0x60]
(external value (StackGuard::address_of_jslimit()))
0x55a9e0000053 13 0f865e000000 jna 0x55a9e00000b7 <+0x77>
0x55a9e0000059 19 488b5518 REX.W movq rdx,[rbp+0x18]
0x55a9e000005d 1d f6c201 testb rdx,0x1
0x55a9e0000060 20 0f857b000000 jnz 0x55a9e00000e1 <+0xa1>
0x55a9e0000066 26 488bca REX.W movq rcx,rdx
0x55a9e0000069 29 d1f9 sarl rcx, 1
0x55a9e000006b 2b 85d2 testl rdx,rdx
0x55a9e000006d 2d 0f8c08000000 jl 0x55a9e000007b <+0x3b>
0x55a9e0000073 33 83e101 andl rcx,0x1
0x55a9e0000076 36 e90f000000 jmp 0x55a9e000008a <+0x4a>
0x55a9e000007b 3b f7d9 negl rcx
0x55a9e000007d 3d 83e101 andl rcx,0x1
0x55a9e0000080 40 85c9 testl rcx,rcx
0x55a9e0000082 42 0f845d000000 jz 0x55a9e00000e5 <+0xa5>
0x55a9e0000088 48 f7d9 negl rcx
0x55a9e000008a 4a 83f901 cmpl rcx,0x1
0x55a9e000008d 4d 0f841e000000 jz 0x55a9e00000b1 <+0x71>
0x55a9e0000093 53 498d4655 REX.W leaq rax,[r14+0x55]
0x55a9e0000097 57 488b4de8 REX.W movq rcx,[rbp-0x18]
0x55a9e000009b 5b 488be5 REX.W movq rsp,rbp
0x55a9e000009e 5e 5d pop rbp
0x55a9e000009f 5f 4883f902 REX.W cmpq rcx,0x2
0x55a9e00000a3 63 7f03 jg 0x55a9e00000a8 <+0x68>
0x55a9e00000a5 65 c21000 ret 0x10
0x55a9e00000a8 68 415a pop r10
0x55a9e00000aa 6a 488d24cc REX.W leaq rsp,[rsp+rcx*8]
0x55a9e00000ae 6e 4152 push r10
0x55a9e00000b0 70 c3 retl
0x55a9e00000b1 71 498d4671 REX.W leaq rax,[r14+0x71]
0x55a9e00000b5 75 ebe0 jmp 0x55a9e0000097 <+0x57>
0x55a9e00000b7 77 ba40000000 movl rdx,0x40
0x55a9e00000bc 7c 52 push rdx
0x55a9e00000bd 7d 48bb00405fc7a9550000 REX.W movq rbx,0x55a9c75f4000
0x55a9e00000c7 87 b801000000 movl rax,0x1
0x55a9e00000cc 8c 48bee51a1800f57d0000 REX.W movq rsi,0x7df500181ae5
;; object: 0x7df500181ae5 <NativeContext[302]>
0x55a9e00000d6 96 e825a246e8 call 0x55a9c846a300
(CEntry_Return1_ArgvOnStack_NoBuiltinExit) ;; near builtin entry
0x55a9e00000db 9b e979ffffff jmp 0x55a9e0000059 <+0x19>
0x55a9e00000e0 a0 90 nop
0x55a9e00000e1 a1 41ff55d8 call [r13-0x28]
0x55a9e00000e5 a5 41ff55d8 call [r13-0x28]
0x55a9e00000e9 a9 41ff55e0 call [r13-0x20]
0x55a9e00000ed ad 0f1f00 nop
Inlined functions (count = 0)
Deoptimization Input Data (deopt points = 3)
index bytecode-offset pc
0 2 NA
1 2 NA
2 -1 9b
Safepoints (stack slots = 6, entries = 1, byte size = 16)
0x55a9e00000db 9b slots (sp->fp): 100000 deopt 2 trampoline:
a9
RelocInfo (size = 5)
0x55a9e00000ce full embedded object (0x7df500181ae5 <NativeContext[12e]>)
0x55a9e00000d7 near builtin entry
--- End code ---
On Tue, Mar 11, 2025 at 2:35 PM Sỹ Trần Dũng <[email protected]> wrote:
> I tried d8 with --allow-natives-syntax --turbofan --print-opt-code flags
> and following code but don't get any output.
>
> function my_mod(n) {
> if (n % 2 == 1)
> return true;
> return false;
> }
>
> my_mod(2);
> my_mod(1);
> my_mod(3);
>
> %OptimizeFunctionOnNextCall(my_mod);
>
> my_mod(100)
>
> I'm not sure which flags to use here.
> On Tuesday, March 11, 2025 at 5:35:11 PM UTC+7 Jakob Kummerow wrote:
>
>> Why don't you test it and find out yourself?
>>
>>
>> On Tue, Mar 11, 2025 at 10:20 AM Sỹ Trần Dũng <[email protected]> wrote:
>>
>>> I have a question regarding V8's compiler optimization, specifically
>>> concerning the modulo 2 operation. In compilers like GCC and Clang, it's
>>> common to see the operation n % 2 optimized to a bitwise AND (n & 1) or a
>>> bit check instruction, as these are generally more efficient.
>>>
>>> I've been examining the bytecode generated by V8, and I've observed that
>>> a modulo instruction is used for n % 2.
>>>
>>> [generated bytecode for function: my_mod (0x3de244c5b401
>>> <SharedFunctionInfo my_mod>)]
>>> Bytecode length: 17
>>> Parameter count 2
>>> Register count 1
>>> Frame size 8
>>> 23 S> 0x32c69b60dc80 @ 0 : 0b 03 Ldar a0
>>> 29 E> 0x32c69b60dc82 @ 2 : 4b 02 00 ModSmi [2], [0]
>>> 0x32c69b60dc85 @ 5 : c9 Star0
>>> 0x32c69b60dc86 @ 6 : 0d 01 LdaSmi [1]
>>> 33 E> 0x32c69b60dc88 @ 8 : 6f f9 01 TestEqual r0, [1]
>>> 0x32c69b60dc8b @ 11 : 9e 04 JumpIfFalse [4]
>>> (0x32c69b60dc8f @ 15)
>>> 45 S> 0x32c69b60dc8d @ 13 : 11 LdaTrue
>>> 57 S> 0x32c69b60dc8e @ 14 : ae Return
>>> 64 S> 0x32c69b60dc8f @ 15 : 12 LdaFalse
>>> 77 S> 0x32c69b60dc90 @ 16 : ae Return
>>>
>>> I'm curious if this behavior changes when the code is "heated" and
>>> optimized by Turbofan.
>>>
>>> Could someone please tell whether Turbofan performs this particular
>>> optimization?
>>>
>>> Thank you for your time and expertise.
>>>
>>> --
>>
>>
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion visit
https://groups.google.com/d/msgid/v8-dev/CAKSzg3Rsj97V%2BecDqs7CLVdA3dEuRd4qWvtUuFRTKgux5qQ0zA%40mail.gmail.com.