. But after looking briefly at the Xindice quick tutorial, it seemed to me that it would be natural to put each document type in its own collection:
/db/genotype/ /db/snp/ /db/haplotype/ /db/sample/ /db/individual/ /db/pedigree/ ..and so on.
Where the genotype collection would be by far the biggest one (one genotype per sample per SNP, where the number of samples will be in the range 180-270 and SNPs from 500 thousand up to 1.5 million). So, yes, unless someone can suggest otherwise, I'd think that a single collection would need to contain those 400M records.
Also, ince if one wants to retrieve a genotype by its unique (within that type class) identifier, it would go something like this, using LSIDs (Life Science Identifiers): /db/genotype/@lsid='urn:LSID:washu.edu:HapMap/
Genotype:23423432434:1
(I'm no good at XPath, I know!)
But if there is a per-laboratory division, one would actually have to know which lab the genotype came from, in addition to its identifier. Not a Good Thing. This would probably also affect other, more complex queries,
I don't know.
Is there a hard limit on the number of documents per Xindice collection? Max number of files per directory or whatever, something outside Xindice' s control?
Mummi, CSHL
On Friday, December 20, 2002, at 03:43 PM, Murray Altheim wrote:
Gudmundur Arni Thorisson wrote:
[...]
It says on the Xindice website that the db is designed for many, small documents. The XML dataset that we will be handling will contain fairly small documents but VERY many of them; up to 400 million instances of the most populous record class.
My question is therefore this: has anyone used/tested Xindice with datasets of this size (hundreds of millions) with decent performance as well? This will be mainly import + query work, hardly any heavy updating load, if that would make a difference as far as performance goes.
One question that may help answer this: would 400 million records be in *one* Xindice Collection, or could these be organized according to some hierarchy, such that there would be a smaller limit at the Collection level?
Murray
...................................................................... Murray Altheim <http://kmi.open.ac.uk/people/murray/> Knowledge Media Institute The Open University, Milton Keynes, Bucks, MK7 6AA, UK
If you're the first person in a new territory, you're likely to get shot at. -- ma
