I have an idea and I was just wondering what you all thought about it. Here's 
the deal:

We are going to use Xindice to store XML data for scientific journal citations. 
The simplest idea is to just dump them all in one collection and use XPath to 
find what we need. But most times, they would be searched by journal name and 
volume.

So what I'm thinking is if I create a subcollection for each journal, and then 
collections for each volume say under that, there would only be a few dozen 
articles in each collection. And since you search first by getting a collection 
and then searching, I'm guessing this would be much faster and could 
effectively eliminate the need for indexers on journal name and volume. And 
presumably I could still search the entire collection when necessary using the 
base collection.

So I'm thinking search the /db/citations/JAMA/132 collection of a few dozen 
documents would be way faster than searching /db/citations where altogether 
there would be hundreds of thousands of documents.

Does this make any sense? Will it be faster? Am I missing any obvious problems 
with this approach? Any ideas would be appreciated.

dan


____________________________________________________________________
Daniel W. Barron
Senior Systems Analyst/Application Developer
American College of Physicians-American Society of Internal Medicine
Tel: (215) 351-2617     Tel: (800) 523-1546 x2617
Fax: (215) 351-2644    E-mail: [EMAIL PROTECTED]


Reply via email to