[Xerces2] How do we want Grammar Caching

sandygao Wed, 08 Aug 2001 11:23:47 -0700
Hi folks,

As Neil Graham mentioned, we are working on the design of Xerces2 schema
support. There will be two big changes in Xerces2. One is the redesign of
schema parsing, which is covered in detail in Neil's note. The other one is
grammar caching, which is also related to DTD and, possibly, other kinds of
grammars, if Xerces will ever support them.

People are interested in grammar caching, and are asking questions about
it. But without knowing what people really want, we couldn't make any
decision on how to provide such functionality, and couldn't answer any of
the questions. So, instead, allow me to ask this question first: how do you
expect Xerces2 to provide grammar caching?

Assuming we have a grammar pool in the system somewhere, and it contains a
set of grammars that are already parsed (a set of objects of Grammar
class), let's consider the following scenarios. I'll use schema for the
examples, and use "grammar A" as a short for "a schema grammar of target
namespace A".

1. When we validate an instance, and see an element from namespace A. To
validate such element, we need to find grammar A. How do we find it?
(Assume grammar A is not already know to such instance).
a. Always look in the grammar pool first. If it's not there, parse the
grammar according to the schema location attributes.
b. Always ask the application. The application can get the grammar from the
grammar pool, or from some other place. The application can also choose to
override the schema location, and the parser will parse the grammar
accordingly.

Which approach do you prefer? Or is there any other approach that serves
your needs better?

2. When we are parsing grammar A, and it imports grammar B, we need to find
grammar B. How do we find it? (Assume grammar B is not already know to such
instance).
The same choices as those for (1).

This is slightly different from the one above. Some application might
choose different approach for the two cases. One never knows.

3. After the parser parses a grammar, how will this grammar be put into the
grammar pool?
a. The parser put the grammar into the pool automatically.
b. The parser just return the grammar to the application, and it's up to
the application to decide whether to put such grammar into the pool.

Again, which approach is preferred?

4. How many grammar pools should there be? And how complicated should the
grammar pool(s) be?
a. One pool for each application, and it can be as simple as a hashtable.
b. One pool for each application, and it must be thread safe. That is, the
grammar pool must be able to handle the case where two or more threads try
to get/put grammars (possibly of the same namespace) at the same time.
c. One pool for each thread, and it can be as simple as a hashtable.
d. Dynamic numbers of grammar pools. The application can create as many as
it wants, and tell the parser which one to use at a certain occasion.

I can come up with two extreme solutions here. Any approach in between
could be what's in Xerces2.

[1] A clean design with less flexibility

Xerces provides a Grammar pool, which is shared across the application.
This grammar pool is thread-safe. The parser gets/puts grammars into/from
the grammar pool automatically. It's like we choose "a a a b" for the above
four questions. This should be sufficient for many user cases, but the
applications won't be able to control how the grammar pool is accessed.

[2] A flexible design

Extreme flexibility means we don't assume anything, hence we couldn't
implement the grammar pool (because any one implementation might not
fulfill some specific case). So we leave the implementation of the grammar
pool to the application, and the application can implement it in any way it
wants: one or more pools, thread-safe or not. Each time an instance
document is parsed (or a standalone grammar is parsed), a list of grammars
will be returned to (or accessed by) the application. The application can
then decide which ones to cache. This is like we choose "b b b d" for the
four questions.

So please consider: Is [1] enough for our lives? Do we need the flexibility
of [2]. Which point between [1] and [2] is most comfortable for us?

There are other questions about the grammar pool:
- How do we access grammars in the grammar pool. For schemas, it might be
easier: we can use the target namespace. How about DTDs and schemas without
target namespace?
- How do we deal with conflicting of grammars (for example, two schema
grammars with the same target namespace)?

But I guess we can answer them after we nail down what's really needed for
grammar caching.

I was trying to prepare a note to describe our thoughts about how we were
going to support grammar caching, and some design/implement detail we could
think of. But I found it really difficult to say anything before we know
what is really desired. And DOM3 is trying to provide its way to do grammar
caching, which makes things even worse.

Anyway, no decision has been made about any aspect of grammar caching. So
make a wish! :-)

Cheers,
Sandy Gao
Software Developer, IBM Canada
(1-416) 448-3255
[EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
[Xerces2] How do we want Grammar Caching

Reply via email to