Cool, thanks. I tracked down the deoptimization problem - a bug in
another part of code was occasionally making the contact list the
number zero instead of an empty list. Fixing that gave me a ~15%
performance boost. My library now performs <3x slower than the
original C version, which is a huge improvement. I'd like to take that
number down further still if I can though.

I'll have a play with typed arrays soon. It seems like I should just
replace some structures wholesale with float64 arrays. It'll be a bit
nasty writing contacts[i * C_LENGTH + C_ROT_X] - but if I'm avoiding
heap allocations, it'll be worth it. Since you can't guarantee that a
number in javascript will remain constant, I imagine I'll want a
compilation step which replaces all the constants with literals.

It seems like replacing the float value in a field with another float
value shouldn't require an allocation. I would expect it to reuse the
box of the previous field value..?

Thanks for the tip about inlining. I manually inlined a couple
function calls earlier, but stopped when they stopped giving me
performance gains. - Which makes sense considering applyImpulse was
deoptimized. Once I've done everything I can think of, I'll take a
good hard read through the source. Considering that I'm spending 35%
of my time in that one function, its a pretty obvious place for
optimization.

Cheers
Joseph

On Fri, Dec 30, 2011 at 7:33 AM, Vyacheslav Egorov <vego...@chromium.org> wrote:
> If you run with --print-code --code-comments you will see generated code (v8
> should be build with objectprint=on disassembler=on) and you'll have to
> locate bailout in the code and figure out why it happens.
>
> If it happens only once then the reason it probably that the function was
> optimized before it got correct type feedback.
>
> I took a very quick look through 2nd version of code that V8 generates
> for Arbiter.applyImpulse, without trying to understand what it does, just by
> looking for inefficiencies. I don't see anything obvious but there are two
> things:
>
> 1) V8 seems to exhaust inlining budget when trying to inline things into
> applyImpulse. It leaves one call in the loop not inlined, which prevents
> proper LICM and probably causes unnecessary boxing. If I relax inlining
> budget by --nolimit-inlining I get 10% boost on the benchmark.
>
> 2) There are fields mutated in the loop that contain floating point values.
> This currently requires boxing (and boxing requires heap allocation, heap
> allocation puts pressure on GC etc). I wonder if you can put typed arrays
> (e.g. Float64Array) to work here.
>
> --
> Vyacheslav Egorov
>
>
>
> On Thu, Dec 29, 2011 at 4:18 AM, Joseph Gentle <jose...@gmail.com> wrote:
>>
>> Wow, thats awesome information. That would explain why the function in
>> question is slow, and why inlining a couple of the function calls it makes
>> decreases overall speed.
>>
>> How do I read the trace I get back? I'm getting this:
>>
>> **** DEOPT: Arbiter.applyImpulse at bailout #49, address 0x0, frame size
>> 264
>> [deoptimizing: begin 0x1b70ac6a67f1 Arbiter.applyImpulse @49]
>>   translating Arbiter.applyImpulse => node=432, height=216
>>     0x7fff6f711630: [top + 248] <- 0x3ebe7f33eb9 ; [esp + 296]
>> 0x3ebe7f33eb9 <JS Object>
>>     0x7fff6f711628: [top + 240] <- 0x2457afa6b4ad ; caller's pc
>>     0x7fff6f711620: [top + 232] <- 0x7fff6f7116c0 ; caller's fp
>> ....
>>
>> I assume address 0x0 means something the function is doing is hitting a
>> null object. Does bailout #49 mean anything? The function is (later)
>> repeatedly optimized and deoptimized again with bailout #8. How do I track
>> these down?
>>
>> -J
>>
>>
>> On Monday, 26 December 2011 23:56:31 UTC+11, Vyacheslav Egorov wrote:
>>>
>>> This is a multiplication stub that is usually called from non-optimized
>>> code (or optimized code that could not be appropriately specialized).
>>> Non-optimizing compiler does not try to infer appropriate representation for
>>> local variable so floating point numbers always get boxed.
>>>
>>> If this stub is high on the profile then it usually means that optimizing
>>> compiler either failed to optimize hot function which does a lot of
>>> multiplications or it failed to infer an optimal representation for some
>>> reason.
>>>
>>> Bottom up profile should show which functions invoke the stub. Then you
>>> should inspect --trace-opt --trace-bailout --trace-deopt output to figure
>>> out what optimizer does with those function.
>>>
>>> --
>>> Vyacheslav Egorov
>>>
>>> On Mon, Dec 26, 2011 at 7:00 AM, Joseph Gentle <jos...@gmail.com> wrote:
>>>>
>>>> What does it mean when I see BinaryOpStub_MUL_Alloc_HeapNumbers in my
>>>> profile? Does that mean the compiler is putting local number variables on
>>>> the heap? Why would it do that?
>>>>
>>>> -J
>>>>
>>>> --
>>>> v8-users mailing list
>>>> v8-u...@googlegroups.com
>>>> http://groups.google.com/group/v8-users
>>>
>>>
>> --
>> v8-users mailing list
>> v8-users@googlegroups.com
>> http://groups.google.com/group/v8-users
>
>
> --
> v8-users mailing list
> v8-users@googlegroups.com
> http://groups.google.com/group/v8-users

-- 
v8-users mailing list
v8-users@googlegroups.com
http://groups.google.com/group/v8-users

Reply via email to