Re: [whatwg] Question on Limits in Adaption Agency Algorithm
On Mon, 1 Jul 2013, Ian Hickson wrote: > > One option would be to remove from the stack of open elements any > elements that we are skipping when we bail out of the AAA. > > Can anyone see a problem with doing that? I think that this solves the issue and clarifies the behaviour of the parser. > > The limits are really intended to reduce the memory consumption of the > algorithm, not its running time. Elements are expensive, and this > algorithm can create exponentially more elements than tags in the markup, > when it's not limited. > Although I agree with the revision above, I would like to clarify the behaviour of the AAA algorithm. I don't think that the AAA algorithm can create exponentially more elements even if it's not limited. Let D be the depth of the stack of open elements and S the size of the DOM tree that has been constructed so far. Then after application of the algorithm: * The size of the DOM tree is at most S + D. * The depth of the stack of open elements is at most D. Even if the algorithm is applied n times, the size of the DOM tree is bounded by S + nD. (not S * 2^n) In total, I think that the number of elements in the DOM tree is in O(m^2) where m is the number of tags in the source HTML. Yasuhiko Minamide
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
>> >> "xyz" is inserted as a child of and the order between "abc" and >> "xyz" is reversed in the tree. We would like to know whether this is an >> intended behaviour of the specification. > > Yeah that's definitely not intentional. > > Does anyone have any preference for how this is fixed? > The easiest fix will be just to remove the limit of the inner loop. Unfortunately, this makes the complexity of current implementations of the adoption agency algorithm in O(n^2). This is because "remove node from the stack of open elements" is in O(n) on implementations as far as I have checked. 9.5 If node is not in the list of active formatting elements, then remove node from the stack of open elements and then go back to the step labeled inner loop. * WebKit : stack of open elements is implemented as a singly-linked list. * Firefox : stack of open elements is implemented as an array. (need to move elements in the array to remove) I'm not sure whether this is ok or not. (I thought the operation remove node was implemented in O(1). For example, by using doubly linked list.) Yasuhiko
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
> The DOM you get when you hit the limits in the adoption agency > algorithm don't make a lot of intuitive sense. Unfortunately, the > limits are necessary so that implementations don't end up having to do > quadratic work. If this behavior is causing you trouble, you might I'm wondering whether the adoption agency algorithm without the limits really has a quadratic complexity (with respect to the size of the stack). Even if we do not impose a limit on the inner loop, each node in the stack of open elements is processed at most once. - The inner loop processes the nodes between the formatting element and the furthest block. - Then, the formatting element is moved below the furthest block. Then, the nodes above the furthest block will not be processed anymore. If we do not impose the limit on the outer loop, the step 4 may cause the quadratic behaviour. However, I think that it can be avoided by slightly revising the algorithm. I'm working on the automated test generation for the HTML5 parser specification and had the question when we try to understand the specification precisely. Yasuhiko Minamide
[whatwg] Question on Limits in Adaption Agency Algorithm
This is about Adaption Agency Algorithm in 12.2.5.4.7 The "in body" insertion mode. Limits of loops in the adoption agency algorithm were introduced in http://html5.org/tools/web-apps-tracker?from=5641&to=5642. However, the limit for the inner loop introduces an unexpected behaviour for the following fragment of an HTML document. abcxyz This document is parsed into the following DOM tree. "xyz" "abc" "xyz" is inserted as a child of and the order between "abc" and "xyz" is reversed in the tree. We would like to know whether this is an intended behaviour of the specification. We are aware that the similar reversal occurs in markup-in-tables. Yasuhiko Minamide