>From your design document, it sounds like this approach retains >guardRef/selfOnlyRef, and will not let a disconnected subtree keep the owner >document's children alive. Am I understanding correctly?
- Maciej On Jun 27, 2012, at 5:55 AM, Kentaro Hara <hara...@chromium.org> wrote: > I wrote a document and implemented a patch of a new reference counting > algorithm. The reference counting algorithm guarantees that "If a Node > X has 1~ reference count, then all the Nodes in the same tree are kept > alive". No performance regression. No additional byte in each Node > object. > > A design document: > https://docs.google.com/a/chromium.org/document/d/1uYHpq7u5Sslj54UgzXjA7pYR53XjidpBcrCa-neOGQs/edit?pli=1 > > A patch: https://bugs.webkit.org/show_bug.cgi?id=88834 > > Comments are appreciated! (You can add a comment on the document side > by side, or post a comment to bug 88834.) > > Thanks! > > > On Tue, Jun 12, 2012 at 4:07 PM, Filip Pizlo <fpi...@apple.com> wrote: >> >> On Jun 11, 2012, at 11:35 PM, Ojan Vafai <o...@chromium.org> wrote: >> >> On Mon, Jun 11, 2012 at 9:32 PM, Filip Pizlo <fpi...@apple.com> wrote: >>> >>> I think that a lot of the performance penalty can be alleviated by: >>> >>> 1) Moving rarely executed paths into non-inlineable methods. >>> >>> 2) Marking the various refing methods ALWAYS_INLINE, after you've moved as >>> much as possible out of line. >>> >>> 3) Using LIKELY and UNLIKELY where appropriate. >> >> >> We should only add these when we can point to a measurable benefit IMO >> (maybe that's what you meant by "appropriate"?). My experience is that >> marking codepaths LIKELY/UNLIKELY very rarely has a measurable performance >> improvement with modern compilers + x86 processors. I can't speak for arm >> since I have less direct experience optimizing for arm. >> >> >> It's more for the benefit of the compiler than the hardware. It can, for >> example, improve register allocation if the compiler knows that in case of >> interference between variables used predominantly on cold paths and >> variables used predominantly on hot paths, it should spill the cold ones. >> The following is a good example: >> >> x = some expression; >> if (--thingy) doStuff(); // not inlineable >> use x; >> >> The function call will interfere with all caller-save registers. Which is >> most registers. The compiler can do one of three things: (1) spill x, (2) >> put x in callee-save register, of which there is a limited supply, or (3) >> put x in a caller save but insert save/restore code around the doStuff() >> call. Last I checked, gcc would prefer (2) or (1) if it ran out of registers >> unless it knew that doStuff() was called rarely. Hence the utility of using >> UNLIKEY in this case. If you know that it is unlikely then you want the >> compiler to go with option (3), which greatly relieves register pressure at >> the cost of making the rare call slightly slower. >> >> You're right that it's not a huge win or even a measurable win in many >> cases. The conditions have to be just right - for one it has to be a case >> where the compiler wasn't already smart enough to do the right thing. And >> current compilers are quite smart indeed so the benefits of these macros >> will converge to zero over time. >> >> In my experience it's a benefit in inline functions that get called from >> many places, since after a lot of inlining the compiler can use all the help >> it can get in trying to unravel which parts of the code matter and which >> don't. I've seen it result in a 5% progression in the past, though >> admittedly it was in runtime functions that get inlined *everywhere* like >> allocation and GC write barriers. >> >> In this particular case, I have a hunch that these macros might make a >> difference. It's worth a shot since adding them doesn't require much >> effort. >> >> >>> The reason I suggest these things is that you're adding a lot of code in a >>> bunch of methods, but a lot of that logic looks like it will execute >>> infrequently. That reduces inlining opportunities. And in places where the >>> methods are still inlined, you're adding code bloat that reduces icache >>> locality and may blow out branch prediction caches. >>> >>> I'm also not sure that your benchmark is conclusive. What does the perf >>> test have in common with things we know we care about, like say PLT? >>> >>> -Filip >>> >>> On Jun 11, 2012, at 8:45 PM, Kentaro Hara <hara...@chromium.org> wrote: >>> >>>> Thanks ggaren! >>>> >>>> I filed a bug here: https://bugs.webkit.org/show_bug.cgi?id=88834 >>>> >>>> >>>>> I believe you could achieve (a) (i.e., preserve all reachable nodes >>>>> without help from the JavaScript garbage collector) with these semantics >>>>> in >>>>> the C++ DOM: >>>>> >>>>> ===== Design ===== >>>>> >>>>> ref(): >>>>> >>>>> ++this->refCount >>>>> ++root()->refCount >>>>> >>>>> deref(): >>>>> >>>>> --this->refCount >>>>> --root()->refCount >>>> >>>> Actually I've tried the approach yesterday but confirmed 25.9% >>>> performance regression, even if I have TreeShared hold a pointer to >>>> the root. Please look at the WIP patch and the performance test in >>>> https://bugs.webkit.org/show_bug.cgi?id=88834. >>>> >>>> What I've learned is that we must not insert any line to ref() and >>>> deref(), which are performance sensitive:) >>>> >>>> I am now trying another approach that hacks Node destruction. Let's >>>> discuss the implementation details in the bug. >>>> >>>> Thanks! >>>> >>>> >>>> On Tue, Jun 12, 2012 at 12:18 PM, Geoffrey Garen <gga...@apple.com> >>>> wrote: >>>>> >>>>> IMHO, (a) would be the best semantics. >>>>> >>>>> >>>>> I agree, and I think that the specification actually does require this. >>>>> >>>>> The real issue here is how to fix this bug efficiently. I believe that >>>>> Geoff Garen has been contemplating this and also has a proposal for how to >>>>> do it. >>>>> >>>>> >>>>> I believe you could achieve (a) (i.e., preserve all reachable nodes >>>>> without help from the JavaScript garbage collector) with these semantics >>>>> in >>>>> the C++ DOM: >>>>> >>>>> ===== Design ===== >>>>> >>>>> ref(): >>>>> >>>>> ++this->refCount >>>>> ++root()->refCount >>>>> >>>>> deref(): >>>>> >>>>> --this->refCount >>>>> --root()->refCount >>>>> >>>>> if !root()->refCount: >>>>> assert(!this->refCount) >>>>> for each descendent of root(): >>>>> delete descendent >>>>> delete root() >>>>> >>>>> Remove 'this' from tree: >>>>> >>>>> assert(this->refCount) // clients must put a node into a RefPtr to >>>>> remove it from a tree >>>>> >>>>> root()->refCount -= this->refCount >>>>> >>>>> for each descendent of this: >>>>> this->refCount += descendent->refCount >>>>> >>>>> Insert 'this' into tree: >>>>> >>>>> root()->refCount += this->refCount >>>>> >>>>> for each descendent of this: >>>>> this->refCount -= descendent->refCount >>>>> >>>>> What is "root()"?: >>>>> >>>>> If this is in a document: >>>>> root() == document() >>>>> >>>>> else: >>>>> root() == the highest link in the parentNode chain >>>>> >>>>> >>>>> ===== FAQ ===== >>>>> >>>>> Cycles? >>>>> >>>>> No cycles are possible because the DOM API does not allow cycles in a >>>>> DOM tree. >>>>> >>>>> Is O(n) subtree insertion and removal OK? >>>>> >>>>> I believe these operations are already O(n). >>>>> >>>>> Is average case O(log n) access to disconnected root() OK? >>>>> >>>>> If n is small and/or ref/deref of disconnected subnodes are rare, then >>>>> yes. >>>>> >>>>> Otherwise, we can optimize this case by putting a direct pointer to >>>>> root() in rare data. >>>>> >>>>> Geoff >>>> >>>> >>>> >>>> -- >>>> Kentaro Hara, Tokyo, Japan (http://haraken.info) >>>> _______________________________________________ >>>> webkit-dev mailing list >>>> webkit-dev@lists.webkit.org >>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev >>> _______________________________________________ >>> webkit-dev mailing list >>> webkit-dev@lists.webkit.org >>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev >> >> > > > > -- > Kentaro Hara, Tokyo, Japan (http://haraken.info) > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev