> From your design document, it sounds like this approach retains > guardRef/selfOnlyRef, and will not let a disconnected subtree keep the owner > document's children alive. Am I understanding correctly?
Yes! - The approach retains guardRef/selfOnlyRef. - Assume that someone refers to a document X as an owner document. If all X's children are 0-ref nodes, then all the X's children are reclaimed and only X is kept alive. Otherwise (i.e. there is any n-ref node in the X's children), all the X's children and X are kept alive. On Thu, Jun 28, 2012 at 2:39 AM, Maciej Stachowiak <m...@apple.com> wrote: > > From your design document, it sounds like this approach retains > guardRef/selfOnlyRef, and will not let a disconnected subtree keep the owner > document's children alive. Am I understanding correctly? > > - Maciej > > On Jun 27, 2012, at 5:55 AM, Kentaro Hara <hara...@chromium.org> wrote: > >> I wrote a document and implemented a patch of a new reference counting >> algorithm. The reference counting algorithm guarantees that "If a Node >> X has 1~ reference count, then all the Nodes in the same tree are kept >> alive". No performance regression. No additional byte in each Node >> object. >> >> A design document: >> https://docs.google.com/a/chromium.org/document/d/1uYHpq7u5Sslj54UgzXjA7pYR53XjidpBcrCa-neOGQs/edit?pli=1 >> >> A patch: https://bugs.webkit.org/show_bug.cgi?id=88834 >> >> Comments are appreciated! (You can add a comment on the document side >> by side, or post a comment to bug 88834.) >> >> Thanks! >> >> >> On Tue, Jun 12, 2012 at 4:07 PM, Filip Pizlo <fpi...@apple.com> wrote: >>> >>> On Jun 11, 2012, at 11:35 PM, Ojan Vafai <o...@chromium.org> wrote: >>> >>> On Mon, Jun 11, 2012 at 9:32 PM, Filip Pizlo <fpi...@apple.com> wrote: >>>> >>>> I think that a lot of the performance penalty can be alleviated by: >>>> >>>> 1) Moving rarely executed paths into non-inlineable methods. >>>> >>>> 2) Marking the various refing methods ALWAYS_INLINE, after you've moved as >>>> much as possible out of line. >>>> >>>> 3) Using LIKELY and UNLIKELY where appropriate. >>> >>> >>> We should only add these when we can point to a measurable benefit IMO >>> (maybe that's what you meant by "appropriate"?). My experience is that >>> marking codepaths LIKELY/UNLIKELY very rarely has a measurable performance >>> improvement with modern compilers + x86 processors. I can't speak for arm >>> since I have less direct experience optimizing for arm. >>> >>> >>> It's more for the benefit of the compiler than the hardware. It can, for >>> example, improve register allocation if the compiler knows that in case of >>> interference between variables used predominantly on cold paths and >>> variables used predominantly on hot paths, it should spill the cold ones. >>> The following is a good example: >>> >>> x = some expression; >>> if (--thingy) doStuff(); // not inlineable >>> use x; >>> >>> The function call will interfere with all caller-save registers. Which is >>> most registers. The compiler can do one of three things: (1) spill x, (2) >>> put x in callee-save register, of which there is a limited supply, or (3) >>> put x in a caller save but insert save/restore code around the doStuff() >>> call. Last I checked, gcc would prefer (2) or (1) if it ran out of registers >>> unless it knew that doStuff() was called rarely. Hence the utility of using >>> UNLIKEY in this case. If you know that it is unlikely then you want the >>> compiler to go with option (3), which greatly relieves register pressure at >>> the cost of making the rare call slightly slower. >>> >>> You're right that it's not a huge win or even a measurable win in many >>> cases. The conditions have to be just right - for one it has to be a case >>> where the compiler wasn't already smart enough to do the right thing. And >>> current compilers are quite smart indeed so the benefits of these macros >>> will converge to zero over time. >>> >>> In my experience it's a benefit in inline functions that get called from >>> many places, since after a lot of inlining the compiler can use all the help >>> it can get in trying to unravel which parts of the code matter and which >>> don't. I've seen it result in a 5% progression in the past, though >>> admittedly it was in runtime functions that get inlined *everywhere* like >>> allocation and GC write barriers. >>> >>> In this particular case, I have a hunch that these macros might make a >>> difference. It's worth a shot since adding them doesn't require much >>> effort. >>> >>> >>>> The reason I suggest these things is that you're adding a lot of code in a >>>> bunch of methods, but a lot of that logic looks like it will execute >>>> infrequently. That reduces inlining opportunities. And in places where the >>>> methods are still inlined, you're adding code bloat that reduces icache >>>> locality and may blow out branch prediction caches. >>>> >>>> I'm also not sure that your benchmark is conclusive. What does the perf >>>> test have in common with things we know we care about, like say PLT? >>>> >>>> -Filip >>>> >>>> On Jun 11, 2012, at 8:45 PM, Kentaro Hara <hara...@chromium.org> wrote: >>>> >>>>> Thanks ggaren! >>>>> >>>>> I filed a bug here: https://bugs.webkit.org/show_bug.cgi?id=88834 >>>>> >>>>> >>>>>> I believe you could achieve (a) (i.e., preserve all reachable nodes >>>>>> without help from the JavaScript garbage collector) with these semantics >>>>>> in >>>>>> the C++ DOM: >>>>>> >>>>>> ===== Design ===== >>>>>> >>>>>> ref(): >>>>>> >>>>>> ++this->refCount >>>>>> ++root()->refCount >>>>>> >>>>>> deref(): >>>>>> >>>>>> --this->refCount >>>>>> --root()->refCount >>>>> >>>>> Actually I've tried the approach yesterday but confirmed 25.9% >>>>> performance regression, even if I have TreeShared hold a pointer to >>>>> the root. Please look at the WIP patch and the performance test in >>>>> https://bugs.webkit.org/show_bug.cgi?id=88834. >>>>> >>>>> What I've learned is that we must not insert any line to ref() and >>>>> deref(), which are performance sensitive:) >>>>> >>>>> I am now trying another approach that hacks Node destruction. Let's >>>>> discuss the implementation details in the bug. >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> On Tue, Jun 12, 2012 at 12:18 PM, Geoffrey Garen <gga...@apple.com> >>>>> wrote: >>>>>> >>>>>> IMHO, (a) would be the best semantics. >>>>>> >>>>>> >>>>>> I agree, and I think that the specification actually does require this. >>>>>> >>>>>> The real issue here is how to fix this bug efficiently. I believe that >>>>>> Geoff Garen has been contemplating this and also has a proposal for how >>>>>> to >>>>>> do it. >>>>>> >>>>>> >>>>>> I believe you could achieve (a) (i.e., preserve all reachable nodes >>>>>> without help from the JavaScript garbage collector) with these semantics >>>>>> in >>>>>> the C++ DOM: >>>>>> >>>>>> ===== Design ===== >>>>>> >>>>>> ref(): >>>>>> >>>>>> ++this->refCount >>>>>> ++root()->refCount >>>>>> >>>>>> deref(): >>>>>> >>>>>> --this->refCount >>>>>> --root()->refCount >>>>>> >>>>>> if !root()->refCount: >>>>>> assert(!this->refCount) >>>>>> for each descendent of root(): >>>>>> delete descendent >>>>>> delete root() >>>>>> >>>>>> Remove 'this' from tree: >>>>>> >>>>>> assert(this->refCount) // clients must put a node into a RefPtr to >>>>>> remove it from a tree >>>>>> >>>>>> root()->refCount -= this->refCount >>>>>> >>>>>> for each descendent of this: >>>>>> this->refCount += descendent->refCount >>>>>> >>>>>> Insert 'this' into tree: >>>>>> >>>>>> root()->refCount += this->refCount >>>>>> >>>>>> for each descendent of this: >>>>>> this->refCount -= descendent->refCount >>>>>> >>>>>> What is "root()"?: >>>>>> >>>>>> If this is in a document: >>>>>> root() == document() >>>>>> >>>>>> else: >>>>>> root() == the highest link in the parentNode chain >>>>>> >>>>>> >>>>>> ===== FAQ ===== >>>>>> >>>>>> Cycles? >>>>>> >>>>>> No cycles are possible because the DOM API does not allow cycles in a >>>>>> DOM tree. >>>>>> >>>>>> Is O(n) subtree insertion and removal OK? >>>>>> >>>>>> I believe these operations are already O(n). >>>>>> >>>>>> Is average case O(log n) access to disconnected root() OK? >>>>>> >>>>>> If n is small and/or ref/deref of disconnected subnodes are rare, then >>>>>> yes. >>>>>> >>>>>> Otherwise, we can optimize this case by putting a direct pointer to >>>>>> root() in rare data. >>>>>> >>>>>> Geoff >>>>> >>>>> >>>>> >>>>> -- >>>>> Kentaro Hara, Tokyo, Japan (http://haraken.info) >>>>> _______________________________________________ >>>>> webkit-dev mailing list >>>>> webkit-dev@lists.webkit.org >>>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev >>>> _______________________________________________ >>>> webkit-dev mailing list >>>> webkit-dev@lists.webkit.org >>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev >>> >>> >> >> >> >> -- >> Kentaro Hara, Tokyo, Japan (http://haraken.info) >> _______________________________________________ >> webkit-dev mailing list >> webkit-dev@lists.webkit.org >> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev > -- Kentaro Hara, Tokyo, Japan (http://haraken.info) _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev