[XXE] Large file loads very slowly in 4.1.0

Hussein Shafie Tue, 04 Nov 2008 19:04:58 +0100

Trevor Nash wrote:
> 
> Hussein Shafie wrote:
> The performance problem comes from the "id()" XPath function. There is
> simply no way to make this kind of rule fast in the context on an XML
> editor (while this is not a problem for a browser or for an XSLT
> processor).
> 
> I'm not sure why you say that.  XSLT processors and the like will build
> an index on reading the document - if they are really smart they delay
> this until the first id() function is executed.  I imagine you know this
> already.
>


XMLmind XML Editor does exactly that too: lazy computation of a
id->Element map. The slow performance comes from the fact that, when
used by the CSS engine, the id->Element map is forgotten just after it
has been computed.

In fact, the XPath 1.0 implementation of XMLmind XML Editor (which is
very fast and very reliable because it comes from James Clark's XT) has
not been designed to be primarily used by the CSS engine, but more by
macro-commands, configuration files, etc.




> The difference in an editing environment is that you have to maintain
> the index as the document changes.  This is certainly technically
> feasible, though the cost is dependant on the architecture of the editor
> and its data store, and it may not be worth the effort for the number of
> times it is used.
> 

Yes, this is feasible, but clearly not worth the effort in the context
of XMLmind XML Editor's CSS engine.



> BTW your workaround is unlikely to be faster because it is still
> examining every node in the tree, but matching the element name instead
> of looking at the id value - then when it matches it looks at the id
> value as well.  You only get an advantage if you know something like
> what level the element is in the tree, e.g. if you are looking for a top
> level section then /*/*...@id...] might work.  The killer is the // at the
> beginning of the xpath.
> 

Yes, anything more specific than //xxx (like /*/*...@id...]) would be faster.



> Essentially id('x') is the same as //*...@id='x']
> 

* id() queries the schema to find the type of each attribute of an
element. This makes it potentially slower than //*...@id='x'].

* //*...@id='x'] could be faster if it had not to traverse the whole
document tree. How to make //*...@id='x'] stop its traversal after it
finds the first matching element? //*...@id='x'][1]? I'm not sure.

[XXE] Large file loads very slowly in 4.1.0

Reply via email to