I wrote a document and implemented a patch of a new reference counting algorithm. The reference counting algorithm guarantees that "If a Node X has 1~ reference count, then all the Nodes in the same tree are kept alive". No performance regression. No additional byte in each Node object.
A design document: https://docs.google.com/a/chromium.org/document/d/1uYHpq7u5Sslj54UgzXjA7pYR53XjidpBcrCa-neOGQs/edit?pli=1 A patch: https://bugs.webkit.org/show_bug.cgi?id=88834 Comments are appreciated! (You can add a comment on the document side by side, or post a comment to bug 88834.) Thanks! On Tue, Jun 12, 2012 at 4:07 PM, Filip Pizlo <fpi...@apple.com> wrote: > > On Jun 11, 2012, at 11:35 PM, Ojan Vafai <o...@chromium.org> wrote: > > On Mon, Jun 11, 2012 at 9:32 PM, Filip Pizlo <fpi...@apple.com> wrote: >> >> I think that a lot of the performance penalty can be alleviated by: >> >> 1) Moving rarely executed paths into non-inlineable methods. >> >> 2) Marking the various refing methods ALWAYS_INLINE, after you've moved as >> much as possible out of line. >> >> 3) Using LIKELY and UNLIKELY where appropriate. > > > We should only add these when we can point to a measurable benefit IMO > (maybe that's what you meant by "appropriate"?). My experience is that > marking codepaths LIKELY/UNLIKELY very rarely has a measurable performance > improvement with modern compilers + x86 processors. I can't speak for arm > since I have less direct experience optimizing for arm. > > > It's more for the benefit of the compiler than the hardware. It can, for > example, improve register allocation if the compiler knows that in case of > interference between variables used predominantly on cold paths and > variables used predominantly on hot paths, it should spill the cold ones. > The following is a good example: > > x = some expression; > if (--thingy) doStuff(); // not inlineable > use x; > > The function call will interfere with all caller-save registers. Which is > most registers. The compiler can do one of three things: (1) spill x, (2) > put x in callee-save register, of which there is a limited supply, or (3) > put x in a caller save but insert save/restore code around the doStuff() > call. Last I checked, gcc would prefer (2) or (1) if it ran out of registers > unless it knew that doStuff() was called rarely. Hence the utility of using > UNLIKEY in this case. If you know that it is unlikely then you want the > compiler to go with option (3), which greatly relieves register pressure at > the cost of making the rare call slightly slower. > > You're right that it's not a huge win or even a measurable win in many > cases. The conditions have to be just right - for one it has to be a case > where the compiler wasn't already smart enough to do the right thing. And > current compilers are quite smart indeed so the benefits of these macros > will converge to zero over time. > > In my experience it's a benefit in inline functions that get called from > many places, since after a lot of inlining the compiler can use all the help > it can get in trying to unravel which parts of the code matter and which > don't. I've seen it result in a 5% progression in the past, though > admittedly it was in runtime functions that get inlined *everywhere* like > allocation and GC write barriers. > > In this particular case, I have a hunch that these macros might make a > difference. It's worth a shot since adding them doesn't require much > effort. > > >> The reason I suggest these things is that you're adding a lot of code in a >> bunch of methods, but a lot of that logic looks like it will execute >> infrequently. That reduces inlining opportunities. And in places where the >> methods are still inlined, you're adding code bloat that reduces icache >> locality and may blow out branch prediction caches. >> >> I'm also not sure that your benchmark is conclusive. What does the perf >> test have in common with things we know we care about, like say PLT? >> >> -Filip >> >> On Jun 11, 2012, at 8:45 PM, Kentaro Hara <hara...@chromium.org> wrote: >> >> > Thanks ggaren! >> > >> > I filed a bug here: https://bugs.webkit.org/show_bug.cgi?id=88834 >> > >> > >> >> I believe you could achieve (a) (i.e., preserve all reachable nodes >> >> without help from the JavaScript garbage collector) with these semantics >> >> in >> >> the C++ DOM: >> >> >> >> ===== Design ===== >> >> >> >> ref(): >> >> >> >> ++this->refCount >> >> ++root()->refCount >> >> >> >> deref(): >> >> >> >> --this->refCount >> >> --root()->refCount >> > >> > Actually I've tried the approach yesterday but confirmed 25.9% >> > performance regression, even if I have TreeShared hold a pointer to >> > the root. Please look at the WIP patch and the performance test in >> > https://bugs.webkit.org/show_bug.cgi?id=88834. >> > >> > What I've learned is that we must not insert any line to ref() and >> > deref(), which are performance sensitive:) >> > >> > I am now trying another approach that hacks Node destruction. Let's >> > discuss the implementation details in the bug. >> > >> > Thanks! >> > >> > >> > On Tue, Jun 12, 2012 at 12:18 PM, Geoffrey Garen <gga...@apple.com> >> > wrote: >> >> >> >> IMHO, (a) would be the best semantics. >> >> >> >> >> >> I agree, and I think that the specification actually does require this. >> >> >> >> The real issue here is how to fix this bug efficiently. I believe that >> >> Geoff Garen has been contemplating this and also has a proposal for how to >> >> do it. >> >> >> >> >> >> I believe you could achieve (a) (i.e., preserve all reachable nodes >> >> without help from the JavaScript garbage collector) with these semantics >> >> in >> >> the C++ DOM: >> >> >> >> ===== Design ===== >> >> >> >> ref(): >> >> >> >> ++this->refCount >> >> ++root()->refCount >> >> >> >> deref(): >> >> >> >> --this->refCount >> >> --root()->refCount >> >> >> >> if !root()->refCount: >> >> assert(!this->refCount) >> >> for each descendent of root(): >> >> delete descendent >> >> delete root() >> >> >> >> Remove 'this' from tree: >> >> >> >> assert(this->refCount) // clients must put a node into a RefPtr to >> >> remove it from a tree >> >> >> >> root()->refCount -= this->refCount >> >> >> >> for each descendent of this: >> >> this->refCount += descendent->refCount >> >> >> >> Insert 'this' into tree: >> >> >> >> root()->refCount += this->refCount >> >> >> >> for each descendent of this: >> >> this->refCount -= descendent->refCount >> >> >> >> What is "root()"?: >> >> >> >> If this is in a document: >> >> root() == document() >> >> >> >> else: >> >> root() == the highest link in the parentNode chain >> >> >> >> >> >> ===== FAQ ===== >> >> >> >> Cycles? >> >> >> >> No cycles are possible because the DOM API does not allow cycles in a >> >> DOM tree. >> >> >> >> Is O(n) subtree insertion and removal OK? >> >> >> >> I believe these operations are already O(n). >> >> >> >> Is average case O(log n) access to disconnected root() OK? >> >> >> >> If n is small and/or ref/deref of disconnected subnodes are rare, then >> >> yes. >> >> >> >> Otherwise, we can optimize this case by putting a direct pointer to >> >> root() in rare data. >> >> >> >> Geoff >> > >> > >> > >> > -- >> > Kentaro Hara, Tokyo, Japan (http://haraken.info) >> > _______________________________________________ >> > webkit-dev mailing list >> > webkit-dev@lists.webkit.org >> > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev >> _______________________________________________ >> webkit-dev mailing list >> webkit-dev@lists.webkit.org >> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev > > -- Kentaro Hara, Tokyo, Japan (http://haraken.info) _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev