Hello Rüdiger! I strongly discourage using separate databases. The reason is that a database is considerably heavy entity. There is a certain hard limit on the number of simultaneously running databases, 10 or so by default. You may try to raise the hardlimit (the hardlimit is set at compile time, you will need to recompile Sedna) or you may try to launch multiple governer instances. However it doesn't worth the effort. Sedna was not designed to support an arbitrary number of databases. Each database needs 100MB of memory for caching data from database files. The amount of used memory is configurable, however lowering it below certain point will hamper the performance. For typical applications 100MB is barely enough. And databases aren't collaborative - each one gets its own private buffer memory. So, performing a trivial calculation, we discover that 10 databases will consume 1GB of memory with default settings. There could be other scaling issues as well.
Starting databases on demand and shutting them down after a certain timeout doesn't sound as a good solution for me. First, databases aren't designed to startup and shutdown fast. Second, when a database is turned off all cached data is lost. If later a database is restarted the data must be re-read from disk. And certainly you not only get more reads but more writes as well. The reason is that before turning a database off all modified data must be flushed from buffer memory to disk. Personally, I would store all data in a single database. Roughly, internally Sedna storage works as follows. Every document is split in multiple 64KB blocks, standalone documents are not intermixed at the block level. In contrast, documents stored in a collection are merged in non-trivial way. If you arrange your data as standlaone documents, they won't "disturb" each other. Currently Sedna doesn't take advantage of multiple disks and data distribution across several files is not availible. Huge database file is not necessary bad. In the ideal circumstances file clusters would be allocated sequentially and large file would have larger seek times since HDD heads travel a longer distance on average. However in the real world file gets fragmented and intermixed with other files. The larger file still incurs a minor slowdown of course, but not by an order of magnitude :-) Sedna support huge files (100GB+) pretty well. There could be still some issues with a horde of standalone documents (the metadata storage may overflow). As far as I know Sedna team is currently working really hard to address this very issue. WBR, Mejedi 2009/3/9 Rüdiger Gleim <[email protected]> > Good Monday Morning everyone :-), > I have spent some time browsing through the archives but still am > undecided on the question: > > Considering a (web based) XML Document Management System designed to > upload, query and edit(!) thousands of large (>>1GB) and small (~500k) > XML Documents would you (I) rather put each document in a separate > database or (II) put them all into one large database as stand-alone > documents? > > From my point of view: > > Option I: Separate Databases > + Isolation: If one database crashes, the others are unaffected > + Database files only get as large as they need to be for one document > + Updates or removals of documents will not effect performance > - Cannot (or should not?) keep all database sessions opened at a time > for sake of memory usage. So a robust management to open and close the > sessions (se_sm / se_smsd) is needed. I have written a Java-Based Pool > for that which automatically closed idle sessions after a while- however > Iam not too happy about that solution > > Option II: > + No efforts needed for session management. Just start one session and > be happy > + Possibly make use of queries over several documents if desired > - VERY Large Database File. I have Bulk Uploaded 100 x 41 MB XML Code > and there is no negative on performance. However if I have thousands of > large (>1GB) and small (~500k) Documents which may in part be updated... > how will Sednas performance scale? > - Is it possible to use one database but configure Sedna to distribute > data over several files as e.g. InnoDB storage backend does? > > I would really appreciate any comments and hints on best practise. > > Regards, > > Rüdiger > > > ------------------------------------------------------------------------------ > Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, > CA > -OSBC tackles the biggest issue in open source: Open Sourcing the > Enterprise > -Strategies to boost innovation and cut costs with open source > participation > -Receive a $600 discount off the registration fee with the source code: > SFAD > http://p.sf.net/sfu/XcvMzF8H > _______________________________________________ > Sedna-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/sedna-discussion >
------------------------------------------------------------------------------
_______________________________________________ Sedna-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/sedna-discussion
