Re: [Xerces2] How do we want Grammar Caching

sandygao Thu, 23 Aug 2001 11:08:57 -0700
> > This is certainly a clean design, but I guess many Xerces1 users are
used
> > to using EntityResolver to override grammar locations, and such change
> > would make them unhappy :-)
>
> Are they happy with current solution? Besides it should be also addressed

> in other API such as SAX as well.

Maybe, maybe not. But we can't ignore the fact that there are already lots
of code that uses EntityResolver to override grammar locations. If we can
avoid changing it, we should.

> But the application is asked for A just once, is not it? It cannot return
> two A's for current document.

Consider this case in Schema: A imports B and C, and C imports B. When
compiling A, the parser asks the application for B and C (to see if they
are cached). Now the application has a chance to provide two different
grammars for B: one when the parser asks for B, and one when the parser
asks for C (which imports B). So we need a mechanism to deal with this
case. This is why I have the method "grammarConflict()". Do you have any
other suggestion?

> I think that we can get rid of getInitalGrammarSet() and
returnFinalGrammarSet()
> if whole cache will look as a Map.
> ...
> The put() is implemented by GrammarCache i.e. application. So it can
decide.

My concerns about map-like interface:

1. Map implies a "one-grammar-per-key" rule on the application. But the
application is free to use any way it wants to cache grammars.

2. Map implies a strong coupling between "get" and "put". It's easy to
expect that "get" gets the grammar I just "put" in the map. But again, it's
up to the application.

3. DOM L3 allows to set grammar(s) to document instances or parsers
*before* the validation starts. This is where "getInitialGrammarSet()" is
useful.

4. Sometimes it's not easy to tell for which grammars we should call "put".
For example, A imports B, then whether to call "put" for B depends on how
we get B:
- B is parsed from a schema document;
- B is returned by the application (through "get");
- B is from the local grammar set.
Then the parser will have to know where a grammar is from.

Cheers,
Sandy Gao
Software Developer, IBM Canada
(1-416) 448-3255
[EMAIL PROTECTED]



                                                                                       
                            
                    Petr Kuzel                                                         
                            
                    <Petr.Kuzel@su       To:     [EMAIL PROTECTED]           
                            
                    n.com>               cc:                                           
                            
                    Sent by:             Subject:     Re: [Xerces2] How do we want 
Grammar Caching                 
                    Petr.Kuzel@sun                                                     
                            
                    .com                                                               
                            
                                                                                       
                            
                                                                                       
                            
                    08/22/2001                                                         
                            
                    08:37 AM                                                           
                            
                    Please respond                                                     
                            
                    to                                                                 
                            
                    xerces-j-dev                                                       
                            
                                                                                       
                            
                                                                                       
                            



[EMAIL PROTECTED] wrote:
>
> Basically, when the parser validates a document instance, there are two
> sets of grammars. The first set is store inside the parser (we can call
it
> the local set), which contains grammars available to the current parsing
> (the current document); while the second set is under the application's
> control, used to store cached grammars (we can call it the cached set).
The
> second set is not mandatory, because some applications might not be
> interested in grammar caching at all.

A good grammar caching overview.

> I don't think we should *always* ask the application for grammars. Please
> refer to Curt's message, and my reply.

Of cource, I did not expressed well.

> This is certainly a clean design, but I guess many Xerces1 users are used
> to using EntityResolver to override grammar locations, and such change
> would make them unhappy :-)

Are they happy with current solution? Besides it should be also addressed
in other API such as SAX as well.

> > > 3. There is still a problem for the design of schema caching. Assume
> > > grammar A (that is, a grammar with target namespace A) is known to
the
> > > parser, then the parser asks the application for grammar B. The
> application
> > > returns grammar B, which imports a different grammar A. Now the two
A's
> > > conflict (we assumed a one-grammar-per-namespace rule). To avoid such
> > > confliction, in the GrammarResolver interface, we ask the application
> to
> > > provide a different grammar B in this case (method grammarConflict
()).
> >
> > I assume that cached grammar B just references grammar A. So
> > parser can use its A copy.
>
> Because the parser has no control over how cached grammars are stored,
it's
> possible that the application returns two different A's.

But the application is asked for A just once, is not it? It cannot return
two A's for current document.

> > Here I doubt if I understand it. Why parser does not interact
> > with pool directly? I prefer to call it GrammarCache.
>
> What I called grammar pool is a abstract concept. It refers to a
collection
> of grammars. The application can choose whatever way to manage it. While
> the interface GrammarResolver/GrammarCache is used to communicate between
> the parser and the application. So the parser only interacts with such
> interface, not the physical collection of grammars.

Yes, I used for the decoupling the Key that is under application
control. It also allows a Map implementation of the cache.

> > public interface GrammarCache {
> >    // it can be implemented by a Map
> >    public Grammar get(Key);
> >    public Grammar put(Key, Grammar);
> >    public interface Key {
> >    }
> > }
> >
> > public interface CacheKeyFactory {
> >   //GrammarCache.Key createKey(String namespace);
> >   //GrammarCache.Key createKey(String pID, String sID);
> >   GrammarCache.Key createKey(String grammarType,
> >                              String grammarKey,
> >                              String hint);
> > }
> >
> > public interface GrammarResolver {
> >    // ask the application to override import
> >    // schemaLocation.
> >    public XMLInputSource resolveGrammarLocation(String grammarType,
> >                                                 String grammarKey,
> >                                                 String hint);
> >    // ask the application to override include/redefine
> >    // schemaLocation.
> >    public XMLInputSource resolveGrammarLocation(String grammarType,
> >                                                 String hint);
> > }
>
> We still need to decide whether to have one GrammarResolver interface, or
> two interfaces GrammarCache/GrammarResolver. What do others think?
>
> Other than that, your interfaces don't look too much different from my
> GrammarResolver. But other methods in my GrammarResolver have their
reason
> to be there. Please refer to my message replying Curt.

The crossposting is really bad habit :-).

I think that we can get rid of getInitalGrammarSet() and
returnFinalGrammarSet()
if whole cache will look as a Map.

Why grammarConflict()? Cache simply must return grammar of proper type
for proper namespace. Why to negotiate it.

> > Time diagram from parser point of view:
> >
> >   $key = ask CacheKeyFactory for a key
> >   $grm = ask GrammarCache for Grammar
> >   if $grm == null {
> >     $in ask GrammarResolver for InputSource
> >     $grm = constructGrammar($in)
> >     put $grm into cache under $key
> >   }
>
> I basically agree. But as the first step, we should try to look for the
> grammar in the local grammar set. If the grammar is not there, we turn to

Right.

> ask the application. And for the last step, instead of put the grammar
into
> cache, we should first store it in the local grammar set, then return the
> grammar to the application. It's up to the application to decide whether
a
> grammar should be cached.

The put() is implemented by GrammarCache i.e. application. So it can
decide.


  Cc.

--
<address>
<a href="mailto:[EMAIL PROTECTED]";>Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/";>Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/";>Jini</a> modules</address>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces2] How do we want Grammar Caching

Reply via email to