If you're running in Windows (or even if you're not) the second "first" schema cache is faster because all directories have been resolved and most of the files are probably in memory. I see this effect everywhere I process a number of files more than once.

Bob Foster

Huw Roberts wrote:
I find that the pre-caching of the grammars takes a very long time (but then
I do have something like 200 grammars totalling about 300k of text).  The
first real parse also takes a long time, but is about 5 times faster that
the grammar caching.  Subsequent parses are about 5 times faster again.  If
I clear the cache and discard the XMLReader and then start over then the
pre-caching is faster than the first time, and there's not a lot of
difference between the first and subsequent parses.  I put this down to the
hotspot compiler personally, and it doesn't particularly worry me since,
once everything is warm, it runs like shit off a shovel.

To summarise (timings in milliseconds):
First schema cache:  1760
First parse:          301
Second parse:          50
Third parse:           40
Fourth parse:          60

Discard everything and start again (but in the same VM):
First schema cache:   371
First parse:           70
Second parse:          40
Third parse:           40
Fourth parse:          50

When running it in a profiler, the profiler records no difference between
first and subsequent parses.

Cheers,
Huw


-----Original Message----- From: Justin Robinson [mailto:[EMAIL PROTECTED] Sent: 11 January 2005 22:22 To: [EMAIL PROTECTED] Subject: Re: Problem caching grammars


But Huw,

Do you find that, if your parse the same document several times, the first
parse is always significantly slowly than the rest, or is it just me?!

Justin

----- Original Message -----
From: "Huw Roberts" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, January 11, 2005 11:05 AM
Subject: RE: Problem caching grammars



Out of interest, I've also been trying to cache grammars recently (posted

on

2004/12/23).
In the end I used the following code:

System.setProperty("org.apache.xerces.xni.parser.XMLParserConfiguration",


"org.apache.xerces.parsers.XMLGrammarCachingConfiguration");

       XMLGrammarCachingConfiguration xmlComponentManager = new
XMLGrammarCachingConfiguration();
       xmlComponentManager.clearGrammarPool();  // this will clear the
grammar pool of the (about to be created) parser.
       try {
           mXmlReader =
XMLReaderFactory.createXMLReader(XmlElementUtil.SAX_PARSER_CLASS);
           mXmlReader.setFeature(XmlElementUtil.SCHEMA_FEATURE, true);


mXmlReader.setFeature("http://xml.org/sax/features/validation";,

true);
           mXmlReader.setProperty(XmlElementUtil.SCHEMA_LOCATION,
uriSchemaLocation);
           EntityResolver resolver = new
XmlElementUtil.JavaResourceEntityResolver(mClassLoader);
           mXmlReader.setEntityResolver(resolver);
           InputSource inputSource = new InputSource(new
StringReader("<CSLSystem/>"));
           mXmlReader.setErrorHandler(new DefaultHandler());
           mXmlReader.parse(inputSource);
       }
       catch (Exception ex) {
           sCategory.error("Unexpected exception thrown during schema
load", ex);
       }
After this code has been executed the cache is populated.

Note that I only have a single no-namespace-schema so it doesn't matter

that

my initial parse
has minimal content.  Also I could probably discard the XMLReader
(mXmlReader) rather than
re-using it; I don't think it will make any difference to the cache.
Finally note that
if you throw an exception during the first parse, then the cache
(GrammarPool) may not
be populated.


-----Original Message----- From: Justin Robinson [mailto:[EMAIL PROTECTED] Sent: 08 January 2005 18:50 To: [EMAIL PROTECTED] Subject: Re: Problem caching grammars


A quick breakpoint shows the validator attempts to retrieve only the

grammar

that I've put in. So the caching doesn't seem to be the problem. I'll look
again and see if I can find a hold-up in configurePipeline().


----- Original Message ----- From: "Bob Foster" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, January 08, 2005 5:54 PM Subject: Re: Problem caching grammars



Justin Robinson wrote:


Thanks Chris,

I'll take a look there. Do you know if Xerces actually tries to access
namespaces/schemas from off the net?

Not a namespace; a namespace is just a string and can't be 'accessed'. But it will get a schema off the net if it has a location on the net. Wouldn't work otherwise.


I'm wondering if, initially, my pool is
missing a grammar and on the first parse it's actually caching the

grammar

I've missed.

Sounds likely. You can set a breakpoint in retrieveGrammar() to see if you get called for a schema you aren't expecting.

Bob Foster


Justin

----- Original Message -----
From: "Christopher Ebert" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, January 06, 2005 8:41 PM
Subject: RE: Problem caching grammars



Just a guess here, but since it's the first parse, I'd suspect
configurePipeline() or some other initialization step. You might look

at

the configuration mechanism and see if there's a way to streamline it;
this would probably mean looking at the way the configuration is
determined and setting parameters for the first pathway that's checked
so it gets the configuration you want.

HTH

Chris

-----Original Message-----
From: Justin Robinson [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 06, 2005 12:34
To: [EMAIL PROTECTED]
Subject: Re: Problem caching grammars

Any thoughts on this?

----- Original Message -----
From: "Justin Robinson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, January 02, 2005 2:34 PM
Subject: Problem caching grammars




Hi there....

I have managed to preparse my XML Schema and have put it in a grammar

pool,


according to the active caching approach descrbied at
http://xml.apache.org/xerces2-j/faq-grammars.html#faq-1

I'm expecting the time taken to set up my SAX parser to increase,

which it


does, so that's fine.
I'm also expecting the time taken on the first parse call to decrease.

This is where my problem is. The first parse still takes an average of

about


7 times longer than subsequent parses.

What else must I do to bring down the time taken for the first parse??

I tried to look at the source code, but I'm having trouble locating

where


the time might be taken up (still learning how to debug). The path of
execution goes through these classes:

1. AbstractSAXParser
2. XMLParser
3. XML11Configuration (methods parse() and configurePipeline())

Any ideas?

Here's how I set up the grammar pool:

 private XMLGrammarPool getGrammarPool() throws IOException {
    // create grammar preparser

    XMLGrammarPreparser preparser = new XMLGrammarPreparser();

    // register a specialized default pre-parser
    preparser.registerPreparser(XMLGrammarDescription.XML_SCHEMA,

null);


    // create grammar pool
    XMLGrammarPool grammarPool = new XMLGrammarPoolImpl();

    // set the grammar pool on the grammar preparser
    // so that all the compiled grammars are automatically
    // placed to the grammar pool




preparser.setProperty("http://apache.org/xml/properties/internal/grammar

-poo


l", grammarPool);

    // parse grammar(s). They are automatically added to the pool,
beacause of the above
    // property that has been set.
    preparser.setFeature("http://xml.org/sax/features/namespaces";,

true);


preparser.setFeature("http://xml.org/sax/features/validation";,

true);


preparser.setFeature("http://apache.org/xml/features/validation/schema";,

true);




preparser.setFeature("http://apache.org/xml/features/validation/schema-f

ull-


checking", true);

    Grammar g = preparser.preparseGrammar(
        XMLGrammarDescription.XML_SCHEMA, new XMLInputSource(null,




"c:\\jdev\\workspace\\UncleJustWiki\\xmlschemas\\DraftRevisionSchema.xsd

",


              null));

    // lock grammar pool. Don't add any more grammars
    grammarPool.lockPool();
    return grammarPool;
 }

Regards,
Justin


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to