Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2013-08-02 Thread Yasuhiko Minamide

On Mon, 1 Jul 2013, Ian Hickson wrote:
 
 One option would be to remove from the stack of open elements any 
 elements that we are skipping when we bail out of the AAA.
 
 Can anyone see a problem with doing that?

I think that this solves the issue and clarifies the behaviour of
the parser.

 
 The limits are really intended to reduce the memory consumption of the 
 algorithm, not its running time. Elements are expensive, and this 
 algorithm can create exponentially more elements than tags in the markup, 
 when it's not limited.
 

Although I agree with the revision above, I would like to clarify
the behaviour of the AAA algorithm.

I don't think that the AAA algorithm can create exponentially more elements
even if it's not limited.

Let D be the depth of the stack of open elements and S the size of the DOM
tree that has been constructed so far.

Then after application of the algorithm:

* The size of the DOM tree is at most S + D. 
* The depth of the stack of open elements is at most D.

Even if the algorithm is applied n times, the size of the DOM tree is bounded
by S + nD. (not S * 2^n)

In total, I think that the number of elements in the DOM tree is in 
O(m^2) where m is the number of tags in the source HTML.

Yasuhiko Minamide







Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2013-08-02 Thread Ian Hickson
On Fri, 2 Aug 2013, Yasuhiko Minamide wrote:
 On Mon, 1 Jul 2013, Ian Hickson wrote:
  
  One option would be to remove from the stack of open elements any 
  elements that we are skipping when we bail out of the AAA.
  
  Can anyone see a problem with doing that?
 
 I think that this solves the issue and clarifies the behaviour of the 
 parser.
 
  The limits are really intended to reduce the memory consumption of the 
  algorithm, not its running time. Elements are expensive, and this 
  algorithm can create exponentially more elements than tags in the 
  markup, when it's not limited.
 
 Although I agree with the revision above, I would like to clarify the 
 behaviour of the AAA algorithm.
 
 I don't think that the AAA algorithm can create exponentially more 
 elements even if it's not limited.
 
 Let D be the depth of the stack of open elements and S the size of the 
 DOM tree that has been constructed so far.
 
 Then after application of the algorithm:
 
 * The size of the DOM tree is at most S + D. 
 * The depth of the stack of open elements is at most D.
 
 Even if the algorithm is applied n times, the size of the DOM tree is 
 bounded by S + nD. (not S * 2^n)
 
 In total, I think that the number of elements in the DOM tree is in 
 O(m^2) where m is the number of tags in the source HTML.

Yeah, that's more accurate. I should have said quadratically, not 
exponentially.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2013-07-31 Thread Ian Hickson
On Mon, 1 Jul 2013, Ian Hickson wrote:
 
 One option would be to remove from the stack of open elements any 
 elements that we are skipping when we bail out of the AAA.
 
 Can anyone see a problem with doing that?

Since nobody raised any problems with this, I've now done this.

For background, this solves this case:

   divbisuttp/b123/p/tt/u/s456

Before this fix, the output is 456 123, with the 123 covered by s, 
u, tt, and b, and the 123 covered by b and i. After the fix, 
the output is 123 456, with the 123 covered by s, u, and b, and 
the 456 covered by no formatting elements.

I could see an argument for leaving the elements in the list of formatting 
elements (so the output would be 123 456, with the 123 covered by s, 
u, and b, and the 456 covered by nothing, but with an i being 
reintroduced if you ended the whole thing with div789). Let me know if 
you think that's better.

This being a change to the parser, it's risky; in fact it could change the 
styles of nodes. However, I think getting the output of the parser to be 
in the same order as the input is something that is far more important 
than getting the exact formatting styles correct. This is not just moving 
nodes out of a table; the old parser rules here moved nodes way off into 
earlier parts of the DOM with no intuitive logic.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2013-07-01 Thread Ian Hickson
On Sat, 3 Nov 2012, Yasuhiko Minamide wrote:
 
 This is about Adaption Agency Algorithm in 12.2.5.4.7 The in body 
 insertion mode.
 
 Limits of loops in the adoption agency algorithm were introduced in 
 http://html5.org/tools/web-apps-tracker?from=5641to=5642. However, the 
 limit for the inner loop introduces an unexpected behaviour for the 
 following fragment of an HTML document.
 
 biasttdiv/babc/b/div/tt/s/axyz/i
 
 This document is parsed into the following DOM tree.
 
 b
   i
 a
   s
 tt/tt
   /s
 /a
 xyz
   /i
 /b
 a
   s
 tt
   div
 b/b
 abc
   /div
 /tt
   /s
 /a
 
 xyz is inserted as a child of i and the order between abc and xyz is 
 reversed in the tree. We would like to know whether this is an intended 
 behaviour of the specification.
 
 We are aware that the similar reversal occurs in markup-in-tables.

On Sat, 3 Nov 2012, Adam Barth wrote:
 
 The DOM you get when you hit the limits in the adoption agency algorithm 
 don't make a lot of intuitive sense.  Unfortunately, the limits are 
 necessary so that implementations don't end up having to do quadratic 
 work.  If this behavior is causing you trouble, you might want to run 
 your content through some sort of pre-processor that re-writes these 
 cases into valid HTML, which should get parsed in intuitive ways by user 
 agents.

On Sat, 8 Dec 2012, Ian Hickson wrote:
 
 Yeah that's definitely not intentional.
 
 Does anyone have any preference for how this is fixed?

On Wed, 12 Dec 2012, James Graham wrote:
 On Wed, 12 Dec 2012, Ian Hickson wrote:
  On Wed, 12 Dec 2012, Henri Sivonen wrote:
   
   Does it need to be fixed? That is, is it breaking real sites?
  
  It reverses the order of text nodes. That's ridiculously unintuitive. 
  Yes, I think it needs solving, even if it isn't hit by any sites.
  
  (If it's hit by sites, it seems likely that they are breaking because 
  of it. If it isn't, then we can safely change it regardless.)
 
 Although changing it does introduce the possibility of unforeseen 
 regressions. Not that I have a strong opinion here, really.

I still think this should be fixed, but having studied it further, I'm not 
sure how to fix it. The problem is that the i element isn't cloned (note 
how the abc text doesn't end up in the i), but when we close all the 
other elements, we eventually get left with the i on the stack, and so 
the xyz text ends up appended to it, but it's still back in its original 
position, earlier in the tree.

One option would be to remove from the stack of open elements any elements 
that we are skipping when we bail out of the AAA.

Can anyone see a problem with doing that?


On Mon, 5 Nov 2012, Yasuhiko Minamide wrote:
 
 I'm wondering whether the adoption agency algorithm without the limits 
 really has a quadratic complexity (with respect to the size of the 
 stack).
 
 Even if we do not impose a limit on the inner loop, each node in the 
 stack of open elements is processed at most once.
 
 - The inner loop processes the nodes between the formatting element
   and the furthest block.
 - Then, the formatting element is moved below the furthest block.
 
 Then, the nodes above the furthest block will not be processed anymore.
 
 If we do not impose the limit on the outer loop, the step 4 may cause
 the quadratic behaviour. However, I think that it can be avoided by
 slightly revising the algorithm.
 
 I'm working on the automated test generation for the HTML5 parser 
 specification and had the question when we try to understand the 
 specification precisely.

On Tue, 18 Dec 2012, Yasuhiko Minamide wrote:
 
 The easiest fix will be just to remove the limit of the inner loop. 
 Unfortunately, this makes the complexity of current implementations of 
 the adoption agency algorithm in O(n^2). This is because remove node 
 from the stack of open elements is in O(n) on implementations as far as 
 I have checked.
 
  9.5 If node is not in the list of active formatting elements, then remove 
  node from the stack of open elements and then go back to the step 
 labeled 
  inner loop.
 
 * WebKit : stack of open elements is implemented as a singly-linked list.
 * Firefox : stack of open elements is implemented as an array.
 (need to move elements in the array to remove)
 
 I'm not sure whether this is ok or not. 
 (I thought the operation remove node was implemented in O(1). For 
 example, by using doubly linked list.)

The limits are really intended to reduce the memory consumption of the 
algorithm, not its running time. Elements are expensive, and this 
algorithm can create exponentially more elements than tags in the markup, 
when it's not limited.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2012-12-17 Thread Yasuhiko Minamide
 
 xyz is inserted as a child of i and the order between abc and 
 xyz is reversed in the tree. We would like to know whether this is an 
 intended behaviour of the specification.
 
 Yeah that's definitely not intentional.
 
 Does anyone have any preference for how this is fixed?
 

The easiest fix will be just to remove the limit of the inner loop.
Unfortunately, this makes the complexity of current implementations of the
adoption agency algorithm in O(n^2). This is because remove node from 
the stack of open elements is in O(n) on implementations as far as I 
have checked.

 9.5 If node is not in the list of active formatting elements, then remove 
 node from the stack of open elements and then go back to the step labeled 
 inner loop.

* WebKit : stack of open elements is implemented as a singly-linked list.
* Firefox : stack of open elements is implemented as an array.
(need to move elements in the array to remove)

I'm not sure whether this is ok or not. 
(I thought the operation remove node was implemented in O(1). 
 For example, by using doubly linked list.)

Yasuhiko 



Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2012-12-12 Thread Henri Sivonen
On Sat, Dec 8, 2012 at 11:05 PM, Ian Hickson i...@hixie.ch wrote:
 the order between abc and
 xyz is reversed in the tree.

 Does anyone have any preference for how this is fixed?

Does it need to be fixed? That is, is it breaking real sites?

-- 
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/


Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2012-12-12 Thread Ian Hickson
On Wed, 12 Dec 2012, Henri Sivonen wrote:
 On Sat, Dec 8, 2012 at 11:05 PM, Ian Hickson i...@hixie.ch wrote:
  the order between abc and xyz is reversed in the tree.
  
  Does anyone have any preference for how this is fixed?
 
 Does it need to be fixed? That is, is it breaking real sites?

It reverses the order of text nodes. That's ridiculously unintuitive. Yes, 
I think it needs solving, even if it isn't hit by any sites.

(If it's hit by sites, it seems likely that they are breaking because of 
it. If it isn't, then we can safely change it regardless.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2012-12-12 Thread James Graham

On Wed, 12 Dec 2012, Ian Hickson wrote:


On Wed, 12 Dec 2012, Henri Sivonen wrote:

On Sat, Dec 8, 2012 at 11:05 PM, Ian Hickson i...@hixie.ch wrote:

the order between abc and xyz is reversed in the tree.


Does anyone have any preference for how this is fixed?


Does it need to be fixed? That is, is it breaking real sites?


It reverses the order of text nodes. That's ridiculously unintuitive. Yes,
I think it needs solving, even if it isn't hit by any sites.

(If it's hit by sites, it seems likely that they are breaking because of
it. If it isn't, then we can safely change it regardless.)


Although changing it does introduce the possibility of unforeseen 
regressions. Not that I have a strong opinion here, really.


Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2012-12-08 Thread Ian Hickson
On Sat, 3 Nov 2012, Yasuhiko Minamide wrote:
 
 This is about Adaption Agency Algorithm in 12.2.5.4.7 The in body 
 insertion mode.
 
 Limits of loops in the adoption agency algorithm were introduced in 
 http://html5.org/tools/web-apps-tracker?from=5641to=5642. However, the 
 limit for the inner loop introduces an unexpected behaviour for the 
 following fragment of an HTML document.
 
 biasttdiv/babc/b/div/tt/s/axyz/i
 
 This document is parsed into the following DOM tree.
 
 b
   i
 a
   s
 tt/tt
   /s
 /a
 xyz
   /i
 /b
 a
   s
 tt
   div
 b/b
 abc
   /div
 /tt
   /s
 /a
 
 xyz is inserted as a child of i and the order between abc and 
 xyz is reversed in the tree. We would like to know whether this is an 
 intended behaviour of the specification.

Yeah that's definitely not intentional.

Does anyone have any preference for how this is fixed?

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2012-11-04 Thread Yasuhiko Minamide

 The DOM you get when you hit the limits in the adoption agency
 algorithm don't make a lot of intuitive sense.  Unfortunately, the
 limits are necessary so that implementations don't end up having to do
 quadratic work.  If this behavior is causing you trouble, you might

I'm wondering whether the adoption agency algorithm without the limits
really has a quadratic complexity (with respect to the size of the stack). 

Even if we do not impose a limit on the inner loop, each node in the
stack of open elements is processed at most once.

- The inner loop processes the nodes between the formatting element
  and the furthest block.
- Then, the formatting element is moved below the furthest block.

Then, the nodes above the furthest block will not be processed anymore.

If we do not impose the limit on the outer loop, the step 4 may cause
the quadratic behaviour. However, I think that it can be avoided by
slightly revising the algorithm.

I'm working on the automated test generation for the HTML5 parser 
specification and had the question when we try to understand the 
specification precisely.

Yasuhiko Minamide