Hi all,

as we started to measure the performance of the parser for small
documents, we saw that Xerces spends a LOT of time in resetting the
pipeline: before each parse, the configuration calls
org.apache.xerces.xni.parsers.XMLComponent reset(..) method on each
component in the pipeline. During the reset call the components not only
reset local settings but also query features/properties that apply to
the component. Normally, users set features and properties before first
parse, and then tell the parser to parse bunch of documents. So in
general case, Xerces spends extra time for querying features and
properties that were never changed before the parse.

To verify how much extra time we spend, I've changed the code to let the
configuration to decide when the XMLComponentManager needs to be passed
in reset() to the components. If there was no change in features or
properties, the configuration while resetting the components passes
"null" value for XMLComponentManager, so component will only reset local
setting and won't attempt to query settings. 

By making this change I've seen up to 20% performance improvement (1-2k
documents), so I think it would be great if we made this change. 

The only big question I have is if anyone things it is an XNI change.
The docs state the following:

/**
  * Resets the component. The component can query the component manager
  * about any features and properties that affect the operation of the
  * component.
  */
public void reset(XMLComponentManager componentManager) 
        throws XMLConfigurationException;

So the docs do not state explicitly that "null" is allowed. But in other
places in XNI, "null" value is also not stated explicitly but could be
used (e.g. while setting a document handler).

So what do you think?

There are other possible solutions, however I think letting the
configuration to control whether properties or features needs to be
queried by the component is the cleanest approach. 

Just in case you are wondering what are other solutions are:
1) each component could implement setFeature, setProperty and get all
the properties and features. However, during
XMLComponent.reset() no features and properties will be queried.
I am not sure if XNI components were designed to be reset in such a way,
and I suspect this approach might be a bit slower.

2) introduce a new internal feature, e.g.
"/internal/settings-unchanged", that
each component can query before querying the rest of features and
properties. If this new feature is set to true, the component won't
query any other features. Again this is a bit slower, plus we are
introducing yet one more feature to Xerces..

Thank you,
-- 
Elena Litani / IBM Toronto

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to