Re: [whatwg] Question on Limits in Adaption Agency Algorithm
On Mon, 1 Jul 2013, Ian Hickson wrote: One option would be to remove from the stack of open elements any elements that we are skipping when we bail out of the AAA. Can anyone see a problem with doing that? I think that this solves the issue and clarifies the behaviour of the parser. The limits are really intended to reduce the memory consumption of the algorithm, not its running time. Elements are expensive, and this algorithm can create exponentially more elements than tags in the markup, when it's not limited. Although I agree with the revision above, I would like to clarify the behaviour of the AAA algorithm. I don't think that the AAA algorithm can create exponentially more elements even if it's not limited. Let D be the depth of the stack of open elements and S the size of the DOM tree that has been constructed so far. Then after application of the algorithm: * The size of the DOM tree is at most S + D. * The depth of the stack of open elements is at most D. Even if the algorithm is applied n times, the size of the DOM tree is bounded by S + nD. (not S * 2^n) In total, I think that the number of elements in the DOM tree is in O(m^2) where m is the number of tags in the source HTML. Yasuhiko Minamide
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
On Fri, 2 Aug 2013, Yasuhiko Minamide wrote: On Mon, 1 Jul 2013, Ian Hickson wrote: One option would be to remove from the stack of open elements any elements that we are skipping when we bail out of the AAA. Can anyone see a problem with doing that? I think that this solves the issue and clarifies the behaviour of the parser. The limits are really intended to reduce the memory consumption of the algorithm, not its running time. Elements are expensive, and this algorithm can create exponentially more elements than tags in the markup, when it's not limited. Although I agree with the revision above, I would like to clarify the behaviour of the AAA algorithm. I don't think that the AAA algorithm can create exponentially more elements even if it's not limited. Let D be the depth of the stack of open elements and S the size of the DOM tree that has been constructed so far. Then after application of the algorithm: * The size of the DOM tree is at most S + D. * The depth of the stack of open elements is at most D. Even if the algorithm is applied n times, the size of the DOM tree is bounded by S + nD. (not S * 2^n) In total, I think that the number of elements in the DOM tree is in O(m^2) where m is the number of tags in the source HTML. Yeah, that's more accurate. I should have said quadratically, not exponentially. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
On Mon, 1 Jul 2013, Ian Hickson wrote: One option would be to remove from the stack of open elements any elements that we are skipping when we bail out of the AAA. Can anyone see a problem with doing that? Since nobody raised any problems with this, I've now done this. For background, this solves this case: divbisuttp/b123/p/tt/u/s456 Before this fix, the output is 456 123, with the 123 covered by s, u, tt, and b, and the 123 covered by b and i. After the fix, the output is 123 456, with the 123 covered by s, u, and b, and the 456 covered by no formatting elements. I could see an argument for leaving the elements in the list of formatting elements (so the output would be 123 456, with the 123 covered by s, u, and b, and the 456 covered by nothing, but with an i being reintroduced if you ended the whole thing with div789). Let me know if you think that's better. This being a change to the parser, it's risky; in fact it could change the styles of nodes. However, I think getting the output of the parser to be in the same order as the input is something that is far more important than getting the exact formatting styles correct. This is not just moving nodes out of a table; the old parser rules here moved nodes way off into earlier parts of the DOM with no intuitive logic. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
On Sat, 3 Nov 2012, Yasuhiko Minamide wrote: This is about Adaption Agency Algorithm in 12.2.5.4.7 The in body insertion mode. Limits of loops in the adoption agency algorithm were introduced in http://html5.org/tools/web-apps-tracker?from=5641to=5642. However, the limit for the inner loop introduces an unexpected behaviour for the following fragment of an HTML document. biasttdiv/babc/b/div/tt/s/axyz/i This document is parsed into the following DOM tree. b i a s tt/tt /s /a xyz /i /b a s tt div b/b abc /div /tt /s /a xyz is inserted as a child of i and the order between abc and xyz is reversed in the tree. We would like to know whether this is an intended behaviour of the specification. We are aware that the similar reversal occurs in markup-in-tables. On Sat, 3 Nov 2012, Adam Barth wrote: The DOM you get when you hit the limits in the adoption agency algorithm don't make a lot of intuitive sense. Unfortunately, the limits are necessary so that implementations don't end up having to do quadratic work. If this behavior is causing you trouble, you might want to run your content through some sort of pre-processor that re-writes these cases into valid HTML, which should get parsed in intuitive ways by user agents. On Sat, 8 Dec 2012, Ian Hickson wrote: Yeah that's definitely not intentional. Does anyone have any preference for how this is fixed? On Wed, 12 Dec 2012, James Graham wrote: On Wed, 12 Dec 2012, Ian Hickson wrote: On Wed, 12 Dec 2012, Henri Sivonen wrote: Does it need to be fixed? That is, is it breaking real sites? It reverses the order of text nodes. That's ridiculously unintuitive. Yes, I think it needs solving, even if it isn't hit by any sites. (If it's hit by sites, it seems likely that they are breaking because of it. If it isn't, then we can safely change it regardless.) Although changing it does introduce the possibility of unforeseen regressions. Not that I have a strong opinion here, really. I still think this should be fixed, but having studied it further, I'm not sure how to fix it. The problem is that the i element isn't cloned (note how the abc text doesn't end up in the i), but when we close all the other elements, we eventually get left with the i on the stack, and so the xyz text ends up appended to it, but it's still back in its original position, earlier in the tree. One option would be to remove from the stack of open elements any elements that we are skipping when we bail out of the AAA. Can anyone see a problem with doing that? On Mon, 5 Nov 2012, Yasuhiko Minamide wrote: I'm wondering whether the adoption agency algorithm without the limits really has a quadratic complexity (with respect to the size of the stack). Even if we do not impose a limit on the inner loop, each node in the stack of open elements is processed at most once. - The inner loop processes the nodes between the formatting element and the furthest block. - Then, the formatting element is moved below the furthest block. Then, the nodes above the furthest block will not be processed anymore. If we do not impose the limit on the outer loop, the step 4 may cause the quadratic behaviour. However, I think that it can be avoided by slightly revising the algorithm. I'm working on the automated test generation for the HTML5 parser specification and had the question when we try to understand the specification precisely. On Tue, 18 Dec 2012, Yasuhiko Minamide wrote: The easiest fix will be just to remove the limit of the inner loop. Unfortunately, this makes the complexity of current implementations of the adoption agency algorithm in O(n^2). This is because remove node from the stack of open elements is in O(n) on implementations as far as I have checked. 9.5 If node is not in the list of active formatting elements, then remove node from the stack of open elements and then go back to the step labeled inner loop. * WebKit : stack of open elements is implemented as a singly-linked list. * Firefox : stack of open elements is implemented as an array. (need to move elements in the array to remove) I'm not sure whether this is ok or not. (I thought the operation remove node was implemented in O(1). For example, by using doubly linked list.) The limits are really intended to reduce the memory consumption of the algorithm, not its running time. Elements are expensive, and this algorithm can create exponentially more elements than tags in the markup, when it's not limited. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
xyz is inserted as a child of i and the order between abc and xyz is reversed in the tree. We would like to know whether this is an intended behaviour of the specification. Yeah that's definitely not intentional. Does anyone have any preference for how this is fixed? The easiest fix will be just to remove the limit of the inner loop. Unfortunately, this makes the complexity of current implementations of the adoption agency algorithm in O(n^2). This is because remove node from the stack of open elements is in O(n) on implementations as far as I have checked. 9.5 If node is not in the list of active formatting elements, then remove node from the stack of open elements and then go back to the step labeled inner loop. * WebKit : stack of open elements is implemented as a singly-linked list. * Firefox : stack of open elements is implemented as an array. (need to move elements in the array to remove) I'm not sure whether this is ok or not. (I thought the operation remove node was implemented in O(1). For example, by using doubly linked list.) Yasuhiko
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
On Sat, Dec 8, 2012 at 11:05 PM, Ian Hickson i...@hixie.ch wrote: the order between abc and xyz is reversed in the tree. Does anyone have any preference for how this is fixed? Does it need to be fixed? That is, is it breaking real sites? -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
On Wed, 12 Dec 2012, Henri Sivonen wrote: On Sat, Dec 8, 2012 at 11:05 PM, Ian Hickson i...@hixie.ch wrote: the order between abc and xyz is reversed in the tree. Does anyone have any preference for how this is fixed? Does it need to be fixed? That is, is it breaking real sites? It reverses the order of text nodes. That's ridiculously unintuitive. Yes, I think it needs solving, even if it isn't hit by any sites. (If it's hit by sites, it seems likely that they are breaking because of it. If it isn't, then we can safely change it regardless.) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
On Wed, 12 Dec 2012, Ian Hickson wrote: On Wed, 12 Dec 2012, Henri Sivonen wrote: On Sat, Dec 8, 2012 at 11:05 PM, Ian Hickson i...@hixie.ch wrote: the order between abc and xyz is reversed in the tree. Does anyone have any preference for how this is fixed? Does it need to be fixed? That is, is it breaking real sites? It reverses the order of text nodes. That's ridiculously unintuitive. Yes, I think it needs solving, even if it isn't hit by any sites. (If it's hit by sites, it seems likely that they are breaking because of it. If it isn't, then we can safely change it regardless.) Although changing it does introduce the possibility of unforeseen regressions. Not that I have a strong opinion here, really.
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
On Sat, 3 Nov 2012, Yasuhiko Minamide wrote: This is about Adaption Agency Algorithm in 12.2.5.4.7 The in body insertion mode. Limits of loops in the adoption agency algorithm were introduced in http://html5.org/tools/web-apps-tracker?from=5641to=5642. However, the limit for the inner loop introduces an unexpected behaviour for the following fragment of an HTML document. biasttdiv/babc/b/div/tt/s/axyz/i This document is parsed into the following DOM tree. b i a s tt/tt /s /a xyz /i /b a s tt div b/b abc /div /tt /s /a xyz is inserted as a child of i and the order between abc and xyz is reversed in the tree. We would like to know whether this is an intended behaviour of the specification. Yeah that's definitely not intentional. Does anyone have any preference for how this is fixed? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Question on Limits in Adaption Agency Algorithm
The DOM you get when you hit the limits in the adoption agency algorithm don't make a lot of intuitive sense. Unfortunately, the limits are necessary so that implementations don't end up having to do quadratic work. If this behavior is causing you trouble, you might I'm wondering whether the adoption agency algorithm without the limits really has a quadratic complexity (with respect to the size of the stack). Even if we do not impose a limit on the inner loop, each node in the stack of open elements is processed at most once. - The inner loop processes the nodes between the formatting element and the furthest block. - Then, the formatting element is moved below the furthest block. Then, the nodes above the furthest block will not be processed anymore. If we do not impose the limit on the outer loop, the step 4 may cause the quadratic behaviour. However, I think that it can be avoided by slightly revising the algorithm. I'm working on the automated test generation for the HTML5 parser specification and had the question when we try to understand the specification precisely. Yasuhiko Minamide