Re: [Sedna-discussion] XML Document Management in one database vs. distinct databases

ZNV Tue, 10 Mar 2009 06:31:34 -0700

Hello Rüdiger!
I strongly discourage using separate databases. The reason is that a
database is considerably heavy entity. There is a certain hard limit on the
number of simultaneously running databases, 10 or so by default. You may try
to raise the hardlimit (the hardlimit is set at compile time, you will need
to recompile Sedna) or you may try to launch multiple governer instances.
However it doesn't worth the effort. Sedna was not designed to support an
arbitrary number of databases. Each database needs 100MB of memory for
caching data from database files. The amount of used memory is configurable,
however lowering it below certain point will hamper the performance. For
typical applications 100MB is barely enough. And databases aren't
collaborative - each one gets its own private buffer memory. So, performing
a trivial calculation, we discover that 10 databases will consume 1GB of
memory with default settings. There could be other scaling issues as well.


Starting databases on demand and shutting them down after a certain timeout
doesn't sound as a good solution for me. First, databases aren't designed to
startup and shutdown fast. Second, when a database is turned off all cached
data is lost. If later a database is restarted the data must be re-read from
disk. And certainly you not only get more reads but more writes as well. The
reason is that before turning a database off all modified data must be
flushed from buffer memory to disk.

Personally, I would store all data in a single database. Roughly, internally
Sedna storage works as follows. Every document is split in multiple 64KB
blocks, standalone documents are not intermixed at the block level. In
contrast, documents stored in a collection are merged in non-trivial way. If
you arrange your data as standlaone documents, they won't "disturb" each
other.

Currently Sedna doesn't take advantage of multiple disks and data
distribution across several files is not availible. Huge database file is
not necessary bad. In the ideal circumstances file clusters would be
allocated sequentially and large file would have larger seek times since HDD
heads travel a longer distance on average. However in the real world file
gets fragmented and intermixed with other files. The larger file still
incurs a minor slowdown of course, but not by an order of magnitude :-)

Sedna support huge files (100GB+) pretty well.

There could be still some issues with a horde of standalone documents (the
metadata storage may overflow). As far as I know Sedna team is currently
working really hard to address this very issue.

WBR, Mejedi

2009/3/9 Rüdiger Gleim <[email protected]>

> Good Monday Morning everyone :-),
> I have spent some time browsing through the archives but still am
> undecided on the question:
>
> Considering a (web based) XML Document Management System designed to
> upload, query and edit(!) thousands of large (>>1GB) and small (~500k)
> XML Documents would you (I) rather put each document in a separate
> database or (II) put them all into one large database as stand-alone
> documents?
>
>  From my point of view:
>
> Option I: Separate Databases
> + Isolation: If one database crashes, the others are unaffected
> + Database files only get as large as they need to be for one document
> + Updates or removals of documents will not effect performance
> - Cannot (or should not?) keep all database sessions opened at a time
> for sake of memory usage. So a robust management to open and close the
> sessions (se_sm / se_smsd) is needed. I have written a Java-Based Pool
> for that which automatically closed idle sessions after a while- however
> Iam not too happy about that solution
>
> Option II:
> + No efforts needed for session management. Just start one session and
> be happy
> + Possibly make use of queries over several documents if desired
> - VERY Large Database File. I have Bulk Uploaded 100 x 41 MB XML Code
> and there is no negative on performance. However if I have thousands of
> large (>1GB) and small (~500k) Documents which may in part be updated...
> how will Sednas performance scale?
> - Is it possible to use one database but configure Sedna to distribute
> data over several files as e.g. InnoDB storage backend does?
>
> I would really appreciate any comments and hints on best practise.
>
> Regards,
>
> Rüdiger
>
>
> ------------------------------------------------------------------------------
> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco,
> CA
> -OSBC tackles the biggest issue in open source: Open Sourcing the
> Enterprise
> -Strategies to boost innovation and cut costs with open source
> participation
> -Receive a $600 discount off the registration fee with the source code:
> SFAD
> http://p.sf.net/sfu/XcvMzF8H
> _______________________________________________
> Sedna-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/sedna-discussion
>

------------------------------------------------------------------------------

_______________________________________________
Sedna-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sedna-discussion

Re: [Sedna-discussion] XML Document Management in one database vs. distinct databases

Reply via email to