Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2013-08-01 Thread Yasuhiko Minamide

On Mon, 1 Jul 2013, Ian Hickson wrote:
> 
> One option would be to remove from the stack of open elements any 
> elements that we are skipping when we bail out of the AAA.
> 
> Can anyone see a problem with doing that?

I think that this solves the issue and clarifies the behaviour of
the parser.

> 
> The limits are really intended to reduce the memory consumption of the 
> algorithm, not its running time. Elements are expensive, and this 
> algorithm can create exponentially more elements than tags in the markup, 
> when it's not limited.
> 

Although I agree with the revision above, I would like to clarify
the behaviour of the AAA algorithm.

I don't think that the AAA algorithm can create exponentially more elements
even if it's not limited.

Let D be the depth of the stack of open elements and S the size of the DOM
tree that has been constructed so far.

Then after application of the algorithm:

* The size of the DOM tree is at most S + D. 
* The depth of the stack of open elements is at most D.

Even if the algorithm is applied n times, the size of the DOM tree is bounded
by S + nD. (not S * 2^n)

In total, I think that the number of elements in the DOM tree is in 
O(m^2) where m is the number of tags in the source HTML.

Yasuhiko Minamide







Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2012-12-17 Thread Yasuhiko Minamide
>> 
>> "xyz" is inserted as a child of  and the order between "abc" and 
>> "xyz" is reversed in the tree. We would like to know whether this is an 
>> intended behaviour of the specification.
> 
> Yeah that's definitely not intentional.
> 
> Does anyone have any preference for how this is fixed?
> 

The easiest fix will be just to remove the limit of the inner loop.
Unfortunately, this makes the complexity of current implementations of the
adoption agency algorithm in O(n^2). This is because "remove node from 
the stack of open elements" is in O(n) on implementations as far as I 
have checked.

 9.5 If node is not in the list of active formatting elements, then remove 
 node from the stack of open elements and then go back to the step labeled 
 inner loop.

* WebKit : stack of open elements is implemented as a singly-linked list.
* Firefox : stack of open elements is implemented as an array.
(need to move elements in the array to remove)

I'm not sure whether this is ok or not. 
(I thought the operation remove node was implemented in O(1). 
 For example, by using doubly linked list.)

Yasuhiko 



Re: [whatwg] Question on Limits in Adaption Agency Algorithm

2012-11-04 Thread Yasuhiko Minamide

> The DOM you get when you hit the limits in the adoption agency
> algorithm don't make a lot of intuitive sense.  Unfortunately, the
> limits are necessary so that implementations don't end up having to do
> quadratic work.  If this behavior is causing you trouble, you might

I'm wondering whether the adoption agency algorithm without the limits
really has a quadratic complexity (with respect to the size of the stack). 

Even if we do not impose a limit on the inner loop, each node in the
stack of open elements is processed at most once.

- The inner loop processes the nodes between the formatting element
  and the furthest block.
- Then, the formatting element is moved below the furthest block.

Then, the nodes above the furthest block will not be processed anymore.

If we do not impose the limit on the outer loop, the step 4 may cause
the quadratic behaviour. However, I think that it can be avoided by
slightly revising the algorithm.

I'm working on the automated test generation for the HTML5 parser 
specification and had the question when we try to understand the 
specification precisely.

Yasuhiko Minamide


[whatwg] Question on Limits in Adaption Agency Algorithm

2012-11-02 Thread Yasuhiko Minamide

This is about Adaption Agency Algorithm in 12.2.5.4.7 The "in body" insertion 
mode.

Limits of loops in the adoption agency algorithm were introduced
in http://html5.org/tools/web-apps-tracker?from=5641&to=5642.
However, the limit for the inner loop introduces an unexpected behaviour for
the following fragment of an HTML document.

abcxyz

This document is parsed into the following DOM tree. 


  

  

  

"xyz"
  


  

  

"abc"
  

  


"xyz" is inserted as a child of  and the order between "abc" and "xyz" is 
reversed in the tree. We would like to know whether this is an intended 
behaviour of the specification.

We are aware that the similar reversal occurs in markup-in-tables.

Yasuhiko Minamide