Cool, thanks. I tracked down the deoptimization problem - a bug in another part of code was occasionally making the contact list the number zero instead of an empty list. Fixing that gave me a ~15% performance boost. My library now performs <3x slower than the original C version, which is a huge improvement. I'd like to take that number down further still if I can though.
I'll have a play with typed arrays soon. It seems like I should just replace some structures wholesale with float64 arrays. It'll be a bit nasty writing contacts[i * C_LENGTH + C_ROT_X] - but if I'm avoiding heap allocations, it'll be worth it. Since you can't guarantee that a number in javascript will remain constant, I imagine I'll want a compilation step which replaces all the constants with literals. It seems like replacing the float value in a field with another float value shouldn't require an allocation. I would expect it to reuse the box of the previous field value..? Thanks for the tip about inlining. I manually inlined a couple function calls earlier, but stopped when they stopped giving me performance gains. - Which makes sense considering applyImpulse was deoptimized. Once I've done everything I can think of, I'll take a good hard read through the source. Considering that I'm spending 35% of my time in that one function, its a pretty obvious place for optimization. Cheers Joseph On Fri, Dec 30, 2011 at 7:33 AM, Vyacheslav Egorov <vego...@chromium.org> wrote: > If you run with --print-code --code-comments you will see generated code (v8 > should be build with objectprint=on disassembler=on) and you'll have to > locate bailout in the code and figure out why it happens. > > If it happens only once then the reason it probably that the function was > optimized before it got correct type feedback. > > I took a very quick look through 2nd version of code that V8 generates > for Arbiter.applyImpulse, without trying to understand what it does, just by > looking for inefficiencies. I don't see anything obvious but there are two > things: > > 1) V8 seems to exhaust inlining budget when trying to inline things into > applyImpulse. It leaves one call in the loop not inlined, which prevents > proper LICM and probably causes unnecessary boxing. If I relax inlining > budget by --nolimit-inlining I get 10% boost on the benchmark. > > 2) There are fields mutated in the loop that contain floating point values. > This currently requires boxing (and boxing requires heap allocation, heap > allocation puts pressure on GC etc). I wonder if you can put typed arrays > (e.g. Float64Array) to work here. > > -- > Vyacheslav Egorov > > > > On Thu, Dec 29, 2011 at 4:18 AM, Joseph Gentle <jose...@gmail.com> wrote: >> >> Wow, thats awesome information. That would explain why the function in >> question is slow, and why inlining a couple of the function calls it makes >> decreases overall speed. >> >> How do I read the trace I get back? I'm getting this: >> >> **** DEOPT: Arbiter.applyImpulse at bailout #49, address 0x0, frame size >> 264 >> [deoptimizing: begin 0x1b70ac6a67f1 Arbiter.applyImpulse @49] >> translating Arbiter.applyImpulse => node=432, height=216 >> 0x7fff6f711630: [top + 248] <- 0x3ebe7f33eb9 ; [esp + 296] >> 0x3ebe7f33eb9 <JS Object> >> 0x7fff6f711628: [top + 240] <- 0x2457afa6b4ad ; caller's pc >> 0x7fff6f711620: [top + 232] <- 0x7fff6f7116c0 ; caller's fp >> .... >> >> I assume address 0x0 means something the function is doing is hitting a >> null object. Does bailout #49 mean anything? The function is (later) >> repeatedly optimized and deoptimized again with bailout #8. How do I track >> these down? >> >> -J >> >> >> On Monday, 26 December 2011 23:56:31 UTC+11, Vyacheslav Egorov wrote: >>> >>> This is a multiplication stub that is usually called from non-optimized >>> code (or optimized code that could not be appropriately specialized). >>> Non-optimizing compiler does not try to infer appropriate representation for >>> local variable so floating point numbers always get boxed. >>> >>> If this stub is high on the profile then it usually means that optimizing >>> compiler either failed to optimize hot function which does a lot of >>> multiplications or it failed to infer an optimal representation for some >>> reason. >>> >>> Bottom up profile should show which functions invoke the stub. Then you >>> should inspect --trace-opt --trace-bailout --trace-deopt output to figure >>> out what optimizer does with those function. >>> >>> -- >>> Vyacheslav Egorov >>> >>> On Mon, Dec 26, 2011 at 7:00 AM, Joseph Gentle <jos...@gmail.com> wrote: >>>> >>>> What does it mean when I see BinaryOpStub_MUL_Alloc_HeapNumbers in my >>>> profile? Does that mean the compiler is putting local number variables on >>>> the heap? Why would it do that? >>>> >>>> -J >>>> >>>> -- >>>> v8-users mailing list >>>> v8-u...@googlegroups.com >>>> http://groups.google.com/group/v8-users >>> >>> >> -- >> v8-users mailing list >> v8-users@googlegroups.com >> http://groups.google.com/group/v8-users > > > -- > v8-users mailing list > v8-users@googlegroups.com > http://groups.google.com/group/v8-users -- v8-users mailing list v8-users@googlegroups.com http://groups.google.com/group/v8-users