Hi,
good timing, I was thinking about it last week:
[EMAIL PROTECTED] wrote:
> People are interested in grammar caching, and are asking questions about
> it. But without knowing what people really want, we couldn't make any
> decision on how to provide such functionality, and couldn't answer any of
> the questions. So, instead, allow me to ask this question first: how do you
> expect Xerces2 to provide grammar caching?
I think that caching should by under application management. So I
suggest a grammar cache property:
interface GrammarCache {
Object put(String URI_key, Object grammar);
Object get(String URI_key);
}
so an application can in the simplest case put the property:
parser.setProperty(".../grammar-cache", new WeakHashMap());
> Assuming we have a grammar pool in the system somewhere, and it contains a
> set of grammars that are already parsed (a set of objects of Grammar
> class), let's consider the following scenarios. I'll use schema for the
> examples, and use "grammar A" as a short for "a schema grammar of target
> namespace A".
>
> 1. When we validate an instance, and see an element from namespace A. To
> validate such element, we need to find grammar A. How do we find it?
> (Assume grammar A is not already know to such instance).
> a. Always look in the grammar pool first. If it's not there, parse the
> grammar according to the schema location attributes.
> b. Always ask the application. The application can get the grammar from the
> grammar pool, or from some other place. The application can also choose to
> override the schema location, and the parser will parse the grammar
> accordingly.
>
> Which approach do you prefer? Or is there any other approach that serves
> your needs better?
The second. Only application can decide what to cache during its lifecycle.
> 2. When we are parsing grammar A, and it imports grammar B, we need to find
> grammar B. How do we find it? (Assume grammar B is not already know to such
> instance).
> The same choices as those for (1).
>
> This is slightly different from the one above. Some application might
> choose different approach for the two cases. One never knows.
>
> 3. After the parser parses a grammar, how will this grammar be put into the
> grammar pool?
> a. The parser put the grammar into the pool automatically.
> b. The parser just return the grammar to the application, and it's up to
> the application to decide whether to put such grammar into the pool.
>
> Again, which approach is preferred?
>
> 4. How many grammar pools should there be? And how complicated should the
> grammar pool(s) be?
> a. One pool for each application, and it can be as simple as a hashtable.
> b. One pool for each application, and it must be thread safe. That is, the
> grammar pool must be able to handle the case where two or more threads try
> to get/put grammars (possibly of the same namespace) at the same time.
> c. One pool for each thread, and it can be as simple as a hashtable.
> d. Dynamic numbers of grammar pools. The application can create as many as
> it wants, and tell the parser which one to use at a certain occasion.
>
> I can come up with two extreme solutions here. Any approach in between
> could be what's in Xerces2.
>
> [1] A clean design with less flexibility
>
> Xerces provides a Grammar pool, which is shared across the application.
> This grammar pool is thread-safe. The parser gets/puts grammars into/from
> the grammar pool automatically. It's like we choose "a a a b" for the above
> four questions. This should be sufficient for many user cases, but the
> applications won't be able to control how the grammar pool is accessed.
>
> [2] A flexible design
>
> Extreme flexibility means we don't assume anything, hence we couldn't
> implement the grammar pool (because any one implementation might not
> fulfill some specific case). So we leave the implementation of the grammar
> pool to the application, and the application can implement it in any way it
> wants: one or more pools, thread-safe or not. Each time an instance
> document is parsed (or a standalone grammar is parsed), a list of grammars
> will be returned to (or accessed by) the application. The application can
> then decide which ones to cache. This is like we choose "b b b d" for the
> four questions.
>
> So please consider: Is [1] enough for our lives? Do we need the flexibility
> of [2]. Which point between [1] and [2] is most comfortable for us?
I need the second. I want to manage what is garbage and what is not.
It is not a reponsibility of parser!
> There are other questions about the grammar pool:
> - How do we access grammars in the grammar pool. For schemas, it might be
> easier: we can use the target namespace. How about DTDs and schemas without
> target namespace?
As a key I propose: target namespace and public ID of DTD grammar, generally
a String URI.
> - How do we deal with conflicting of grammars (for example, two schema
> grammars with the same target namespace)?
Let application decide according to context. It can cache two grammars for
the same URI and return one of them depending on context.
> But I guess we can answer them after we nail down what's really needed for
> grammar caching.
>
> I was trying to prepare a note to describe our thoughts about how we were
> going to support grammar caching, and some design/implement detail we could
> think of. But I found it really difficult to say anything before we know
> what is really desired. And DOM3 is trying to provide its way to do grammar
> caching, which makes things even worse.
>
> Anyway, no decision has been made about any aspect of grammar caching. So
> make a wish! :-)
Keep it simple and under full application control. :-)
Thanks
Cc.
--
<address>
<a href="mailto:[EMAIL PROTECTED]">Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/">Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/">Jini</a> modules</address>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]