Re: integrating grammar preparsing with the grammar caching API

Andy Clark Thu, 04 Apr 2002 23:24:11 -0800

[EMAIL PROTECTED] wrote:
> That's also why I wouldn't mind seeing methods like lockGrammarPool and co.
> on the configuration.  People (Joe Kesselman for instance) have complained
> that even requiring users to know about configurations is bad; if we can


These methods don't make sense for the configuration interface
because it promises that *all* configurations will have grammar
caching functionality. This is clearly not the case.

The Xerces Native Interface is called such because it is native
to the *internals* of the parser. I don't think that the average 
user should be using XNI directly. The way that we implement the 
end-solution for users who just wants a public way to perform 
grammar caching does not need to have direct bearing on the 
internal API. It might, in some cases, but I don't think this 
is one of those cases.

> localize the learning code to one interface as much as possible, I think
> that's a good thing.  But if you're really opposed I can settle with a set
> of parseGrammar methods on this putative configuration.

Look at it another way. Should a user expect to have grammar
caching funcionality from a parser returned by a generic
factory such as JAXP? No. So they're going to have to write
code that is specific to our grammar caching mechanism unless
their application is strictly DOM-based.

Let's make a convenient grammar caching implementation for 
people to use with the parser configuration implementations
that we provide and design the internal APIs appropriately.
(Note the excessive use of the word "implementation".)

> Currently, in the xni.parser package we have interfaces to
> define a grammar, a > grammar description, and a grammar pool.
> 
> In the xni.grammars package you mean.

Correct. My typo.

> Hope to hear from you further Andy; seems like only you and I are following
> these threads these days...

Well, I haven't had much time to come up with anything
constructive to add to the conversation, yet. So far I've just
had time to stamp my feet and yell when I don't like something
I've heard. ;) But it's percolating...

To get moving on this, let's precisely define what we need to
make this system work, even if it's only a broad stroke at
this point. Right now I think we need the following:

  1) grammar object
  2) grammar pool
  3) grammar parser
  4) grammar caching configuration

Number 1 is defined using the xni.grammars.Grammar and
XMLGrammarDescription interfaces. Number 2 is the XMLGrammarPool
in the same package. Collectively, these interfaces allow use to
abstract grammars and cache them in a simple way. However, this
still requires specific grammar types (DTD, XML Schema, etc) to
implement specific grammar implementations. No problem here --
we've already learned through experience that an uber-grammar
and validator just doesn't work (or works very poorly).

Since we have different grammar implementations (with different
syntaxes) we need separate ways to load these grammars. For DTDs,
this involves having a standalone DTD scanner (which we have) and
a way to take this information and building a DTD grammar object.
This code currently lives within the DTD validator but can easily
be broken out separately. But DTD is the simple case...

For XML Schemas, the issue is more complicated but basically the
same. The big difference is that the compilation of the grammar
object involves (in some, if not most, cases) the reading of 
other XML Schema grammars which may already be loaded into the
cache that we're populating or may have to be parsed separately.
We're doing this now and the design of the XMLGrammarPool 
reflects the functionality needed to make this happen.

Right now we have a grammar caching parser configuration but
it simply has two methods called "parseGrammar" which really
is only for parsing XML Schemas at the moment. But the intent
is that it would be generic.

So the question is this: should the grammar cache be responsible 
for loading grammars of different types? or is the application
responsible for loading the grammars and populating the cache?

In the first scenario the application code might look a little
bit like this:

  GrammarCachingConfig config = new GrammarCachingConfig();
  config.loadGrammar("DTD", "http://example.com/grammar.dtd";);
  config.loadGrammar("XSD", "http://example.com/grammar.xsd";);

  DOMParser parser = new DOMParser(config);
  parser.parse("http://example.com/document.xml";);

But what about Relax NG or other types of grammars? This
becomes tricky because there's no point in being able to load
a grammar of a specific type if the configuration doesn't have
a validator in the pipeline that can handle that kind of 
grammar. To solve this problem...

We could separate the grammar cache from the configuration
and make the user responsible for loading grammars into it.
Then the user would have to create a parser configuration 
that has the components needed to do the validation. That 
approach might be something like this:

  GrammarCache cache = new GrammarCache();
  DTDGrammar[] dtds = {
    DTDGrammar.load("http://example.com/grammar1.dtd";),
    DTDGrammar.load("http://example.com/grammar2.dtd";),
  };
  cache.cacheGrammars("DTD", dtds);
  RelaxGrammar[] relaxes = {
    RelaxGrammar.load("http://example.com/grammar.rng";),
  };
  cache.cacheGrammars("RNG", relaxes);

  XMLParserConfiguration config = new MyConfig(cache);

  DOMParser parser = new DOMParser(config);
  parser.parse("http://example.com/document.xml";);

It's more verbose but it has the added benefit of making
the cache truly generic. And the onus would be on the app
to choose an appropriate configuration for validating 
documents based on the grammar types in the cache. (And 
there's nothing preventing us from providing a convenience 
class that presents the API from the first approach but 
uses the second approach under the covers.)

Okay, enough rambling from me! What do other people think?

-- 
Andy Clark * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: integrating grammar preparsing with the grammar caching API

Reply via email to