Bob Foster
Huw Roberts wrote:
I find that the pre-caching of the grammars takes a very long time (but then I do have something like 200 grammars totalling about 300k of text). The first real parse also takes a long time, but is about 5 times faster that the grammar caching. Subsequent parses are about 5 times faster again. If I clear the cache and discard the XMLReader and then start over then the pre-caching is faster than the first time, and there's not a lot of difference between the first and subsequent parses. I put this down to the hotspot compiler personally, and it doesn't particularly worry me since, once everything is warm, it runs like shit off a shovel.
To summarise (timings in milliseconds): First schema cache: 1760 First parse: 301 Second parse: 50 Third parse: 40 Fourth parse: 60
Discard everything and start again (but in the same VM): First schema cache: 371 First parse: 70 Second parse: 40 Third parse: 40 Fourth parse: 50
When running it in a profiler, the profiler records no difference between first and subsequent parses.
Cheers, Huw
-----Original Message----- From: Justin Robinson [mailto:[EMAIL PROTECTED] Sent: 11 January 2005 22:22 To: [EMAIL PROTECTED] Subject: Re: Problem caching grammars
But Huw,
Do you find that, if your parse the same document several times, the first parse is always significantly slowly than the rest, or is it just me?!
Justin
----- Original Message ----- From: "Huw Roberts" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, January 11, 2005 11:05 AM Subject: RE: Problem caching grammars
Out of interest, I've also been trying to cache grammars recently (posted
on
2004/12/23). In the end I used the following code:
System.setProperty("org.apache.xerces.xni.parser.XMLParserConfiguration",
"org.apache.xerces.parsers.XMLGrammarCachingConfiguration");
XMLGrammarCachingConfiguration xmlComponentManager = new XMLGrammarCachingConfiguration(); xmlComponentManager.clearGrammarPool(); // this will clear the grammar pool of the (about to be created) parser. try { mXmlReader = XMLReaderFactory.createXMLReader(XmlElementUtil.SAX_PARSER_CLASS); mXmlReader.setFeature(XmlElementUtil.SCHEMA_FEATURE, true);
mXmlReader.setFeature("http://xml.org/sax/features/validation",
true); mXmlReader.setProperty(XmlElementUtil.SCHEMA_LOCATION, uriSchemaLocation); EntityResolver resolver = new XmlElementUtil.JavaResourceEntityResolver(mClassLoader); mXmlReader.setEntityResolver(resolver); InputSource inputSource = new InputSource(new StringReader("<CSLSystem/>")); mXmlReader.setErrorHandler(new DefaultHandler()); mXmlReader.parse(inputSource); } catch (Exception ex) { sCategory.error("Unexpected exception thrown during schema load", ex); } After this code has been executed the cache is populated.
Note that I only have a single no-namespace-schema so it doesn't matter
that
my initial parse has minimal content. Also I could probably discard the XMLReader (mXmlReader) rather than re-using it; I don't think it will make any difference to the cache. Finally note that if you throw an exception during the first parse, then the cache (GrammarPool) may not be populated.
-----Original Message----- From: Justin Robinson [mailto:[EMAIL PROTECTED] Sent: 08 January 2005 18:50 To: [EMAIL PROTECTED] Subject: Re: Problem caching grammars
A quick breakpoint shows the validator attempts to retrieve only the
grammar
that I've put in. So the caching doesn't seem to be the problem. I'll look again and see if I can find a hold-up in configurePipeline().
----- Original Message ----- From: "Bob Foster" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, January 08, 2005 5:54 PM Subject: Re: Problem caching grammars
Justin Robinson wrote:
Thanks Chris,
I'll take a look there. Do you know if Xerces actually tries to access namespaces/schemas from off the net?
Not a namespace; a namespace is just a string and can't be 'accessed'. But it will get a schema off the net if it has a location on the net. Wouldn't work otherwise.
I'm wondering if, initially, my pool is missing a grammar and on the first parse it's actually caching the
grammar
I've missed.
Sounds likely. You can set a breakpoint in retrieveGrammar() to see if you get called for a schema you aren't expecting.
Bob Foster
Justin
----- Original Message ----- From: "Christopher Ebert" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, January 06, 2005 8:41 PM Subject: RE: Problem caching grammars
Just a guess here, but since it's the first parse, I'd suspect configurePipeline() or some other initialization step. You might look
at
the configuration mechanism and see if there's a way to streamline it; this would probably mean looking at the way the configuration is determined and setting parameters for the first pathway that's checked so it gets the configuration you want.
HTH
Chris
-----Original Message----- From: Justin Robinson [mailto:[EMAIL PROTECTED] Sent: Thursday, January 06, 2005 12:34 To: [EMAIL PROTECTED] Subject: Re: Problem caching grammars
Any thoughts on this?
----- Original Message ----- From: "Justin Robinson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, January 02, 2005 2:34 PM Subject: Problem caching grammars
Hi there....
I have managed to preparse my XML Schema and have put it in a grammar
pool,
according to the active caching approach descrbied at http://xml.apache.org/xerces2-j/faq-grammars.html#faq-1
I'm expecting the time taken to set up my SAX parser to increase,
which it
does, so that's fine. I'm also expecting the time taken on the first parse call to decrease.
This is where my problem is. The first parse still takes an average of
about
7 times longer than subsequent parses.
What else must I do to bring down the time taken for the first parse??
I tried to look at the source code, but I'm having trouble locating
where
the time might be taken up (still learning how to debug). The path of execution goes through these classes:
1. AbstractSAXParser 2. XMLParser 3. XML11Configuration (methods parse() and configurePipeline())
Any ideas?
Here's how I set up the grammar pool:
private XMLGrammarPool getGrammarPool() throws IOException { // create grammar preparser
XMLGrammarPreparser preparser = new XMLGrammarPreparser();
// register a specialized default pre-parser preparser.registerPreparser(XMLGrammarDescription.XML_SCHEMA,
null);
// create grammar pool XMLGrammarPool grammarPool = new XMLGrammarPoolImpl();
// set the grammar pool on the grammar preparser // so that all the compiled grammars are automatically // placed to the grammar pool
preparser.setProperty("http://apache.org/xml/properties/internal/grammar
-poo
l", grammarPool);
// parse grammar(s). They are automatically added to the pool, beacause of the above // property that has been set. preparser.setFeature("http://xml.org/sax/features/namespaces",
true);
preparser.setFeature("http://xml.org/sax/features/validation",
true);
preparser.setFeature("http://apache.org/xml/features/validation/schema",
true);
preparser.setFeature("http://apache.org/xml/features/validation/schema-f
ull-
checking", true);
Grammar g = preparser.preparseGrammar( XMLGrammarDescription.XML_SCHEMA, new XMLInputSource(null,
"c:\\jdev\\workspace\\UncleJustWiki\\xmlschemas\\DraftRevisionSchema.xsd
",
null));
// lock grammar pool. Don't add any more grammars grammarPool.lockPool(); return grammarPool; }
Regards, Justin
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]