Hello, Software AG Tamino XML Database might be the right answer. They are offering as a robust native xml database for mission critical applications. The only problem its price was 45000$ last year! (In our case, we are Software AG Partner Software Company in Turkey so we can bundle with reasonable prices)
Devrim Parsera IT ----- Original Message ----- From: "Gudmundur Arni Thorisson" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Friday, December 20, 2002 11:07 PM Subject: Re: Xindice scalability: using in a large bio > Ximbro, I have not estimated the total storage requirements for the > project yet, as we have not yet finalized more than a part of the schema, > including the 400M instance class. But it is certain that the size of > each record in that class will be quite small, in an XML-sense, something > like this: > > <genotype lsid="urn:LSID:genome.wi.mit.edu:HapMap/Genotype:883423434:1" > > > <snp_assay > lsid="urn:LSID:genome.wi.mit.edu:HapMap/SNPAssay:300004343:1"/> > <sample lsid="urn:LSID:hapmap.org:HapMap/Sample:1004:1"/> > <genotyping_protocol > lsid="urn:LSID:genome.wi.mit.edu:HapMap/Protocol:0034:1:1/> > <alleles> > <allele base="G"/> > <allele base="T"/> > </alleles> > </genotype> > > Let's see, this less than 1/2 Kb in size for a single file, times 400 > million records equals a whole bunch of space, you're right! I suppose I'd > better look into the filesize limit thing and make sure that candidate db' > s and OS platform(s) (Linux preferable) can in fact handle this size of > files. Thanks for the tip, Kimbro. > > > Mummi > > > On Friday, December 20, 2002, at 06:19 PM, Kimbro Staken wrote: > > > > > On Friday, December 20, 2002, at 09:06 AM, Gudmundur Arni Thorisson > > wrote: > > > >> Thanks for rapid reply, Murray. There could be some division there, > >> articifical if need be (divide by e.g. laboratory that produced > >> genotypes) > >> . But after looking briefly at the Xindice quick tutorial, it seemed to > >> me that it would be natural to put each document type in its own > >> collection: > >> > >> /db/genotype/ > >> /db/snp/ > >> /db/haplotype/ > >> /db/sample/ > >> /db/individual/ > >> /db/pedigree/ > >> ..and so on. > >> > >> Where the genotype collection would be by far the biggest one (one > >> genotype per sample per SNP, where the number of samples will be in the > >> range 180-270 and SNPs from 500 thousand up to 1.5 million). So, yes, > >> unless someone can suggest otherwise, I'd think that a single collection > >> would need to contain those 400M records. > >> Also, ince if one wants to retrieve a genotype by its unique (within > >> that type class) identifier, it would go something like this, using > >> LSIDs (Life Science Identifiers): /db/genotype/@lsid='urn:LSID:washu.edu: > >> HapMap/ > >> Genotype:23423432434:1 > >> (I'm no good at XPath, I know!) > >> But if there is a per-laboratory division, one would actually have to > >> know which lab the genotype came from, in addition to its identifier. > >> Not a Good Thing. This would probably also affect other, more complex > >> queries, > >> I don't know. > >> > >> Is there a hard limit on the number of documents per Xindice > >> collection? > > > > There is no hard limit. > > > >> Max number of files per directory or whatever, something outside > >> Xindice' > >> s control? > > > > The first external limit you'll run into will be file size. Xindice can't > > span a collection across files yet, so if your file system limits file > > size to 4GB or something that will be all you can store. This of course > > varies by platform. > > > > Really though 400 million is a pretty big number. The most I've ever > > tested with was a little over 1 million. The server could handle more, > > but it was really pushing the limits of the current system. So until > > Xindice matures quite a bit more, I'd have to recommend against it as a > > solution. > > > > It will be tough to find an open source solution that can easily handle > > that many documents with acceptable performance. As far as I know eXist > > won't be any better in this area. Honestly, I'd really have to question > > whether Oracle can even handle that much XML. Obviously, for relational > > data it's up to the task, but XML is quite a bit different and there's > > still some pretty inefficient aspects to what they're doing. Of course I > > do think Oracle is better then anything else currently available. > > > > Beside number of documents, have you estimated document size and storage > > space required? Even if you're looking at only 1k per document, I believe > > once you throw in indexes and overhead, you're pushing 1TB in data size. > > That's a pretty big chunk of data, it's not going to be easy to manage no > > matter which route you take. > > > >> > >> > >> Mummi, CSHL > >> > >> On Friday, December 20, 2002, at 03:43 PM, Murray Altheim wrote: > >> > >>> Gudmundur Arni Thorisson wrote: > >>> > >>> [...] > >>> > >>>> It says on the Xindice website that the db is designed for many, > >>>> small documents. The XML dataset that we will be handling will > >>>> contain fairly small documents but VERY many of them; up to 400 > >>>> million instances of the most populous record class. > >>>> My question is therefore this: has anyone used/tested Xindice with > >>>> datasets of this size (hundreds of millions) with decent performance > >>>> as well? This will be mainly import + query work, hardly any heavy > >>>> updating load, if that would make a difference as far as performance > >>>> goes. > >>> > >>> > >>> One question that may help answer this: would 400 million records > >>> be in *one* Xindice Collection, or could these be organized according > >>> to some hierarchy, such that there would be a smaller limit at the > >>> Collection level? > >>> > >>> > >>> Murray > >>> > >>> ...................................................................... > >>> Murray Altheim <http://kmi.open.ac.uk/people/murray/> > >>> Knowledge Media Institute > >>> The Open University, Milton Keynes, Bucks, MK7 6AA, UK > >>> > >>> If you're the first person in a new territory, > >>> you're likely to get shot at. > >>> -- ma > >>> > >>> > >> > >> > > Kimbro Staken > > Java and XML Software, Consulting and Writing http://www.xmldatabases.org/ > > Apache Xindice native XML database http://xml.apache.org/xindice > > XML:DB Initiative http://www.xmldb.org > > > > > > >
