On 26 Jan 2004, at 04:18, Wallmer, Martin wrote:
Hi Stefano,
One resource, could, in theory, be stored in different stores at the same time
how would you achieve that?
uh, what do you mean? [feeling stupid]
or being indexed by different indexers at the same time.
My point is: the way you partition your content tree for storing should not necessarely be the same that you might want to use for indexing.
And, more important, you might want to have indexers overlap (thus, not creating a partition of your content space but simply a coverage).
Did I make sense?
Could you please provide use cases for the different scenarios?
Sure.
I think that instead of writing an huber-index that is capable of understanding all sort of information, I would like to build "focused indexes" and then have the ability to query them differently for performance and optimal reasons.
So, for example, consider a collection of xml documents, not all of them have the same schema, but you know that, sometimes, having or following a particular schema helps you index them in a particular way (consider, for example, having RDF content that needs to be dereferenced against a remote terminology inferring service in order to unify the schema).
But, at the same time, this is xml content and you might want to use the usual XPath queries on top of this.
So, this documents, potentially, need to be processed by different indexers and I would like to be able to choose which one to connect to when making my query.
an example would be:
1) *.xml -> XMLIndexer 2) /news/*.xml -> TextIndexer 3) /medical/guidelines/*.xml -> RDFIndexer
so, when you save a document as
/medical/guidelines/chemiotherapy.xml
this is matched by both "*.xml" and "/medical/guidelines/*.xml" which means that both the XMLIndexer and the RDFIndexer will do something with it.
Then, later, I can ask questions like "give me all documents that were authored by Stefano" by having a "where" clause like "//[contains(dc:author,'Stefano')]" and I would ask the XMLIndexer about this, but for questions such as "give me all treatments that don't involve the use of glicerine", I would ask the RDFIndexer.
But for questions like "give me all the news that contain the words 'George' and 'Bush'" you would call the TextIndexer.
As you can see, there are potentially two approaches here:
1) one indexer and differentiation is made by the user in the query
2) more (potentially overlapping) indexers and differentiation is made by the administrator (and users choose which index to use)
I tend to think that the second approach is better, also because it contains the first.
-- Stefano.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
