Hi Dave, You will find that this issue is not confined to only Sedna but rather most XML databases whether they be "native XML databases" or implemented over a relational DB such as it MonetDB. Except if your application is running in a very disk space limited environment (such as a mobile device), disk space these days is so cheap that it's not really an issue to worry too much about. Having said that I'll try to explain why it is like that.
Going from 70MB log file (presumably plain text as variable length log lines) to 144MB in XML format is easily explainable realizing the space that the addition of XML tags take up. (That's not telling you anything new as you seem to appreciate that bit). Going from XML text to persisting the data in XML database has a storage overhead for analogous reasons that being the addition of XPath-axis relationship information between the nodes in the XML if for no other reason. Think for a moment about how XML-DOM (Document Object Model) is implemented. (Not saying that Sedna is a implemented as a persistent DOM but it's useful to analyze your issue this way). For each node in the document, in order for the database to implement XPath navigation efficiently it needs to store "pointers" to parent and ancestor nodes, child nodes, previous sibling and following sibling nodes, list of attribute nodes (in case of element nodes) and so on for all 13 (in number I think) different XPath axes. This all takes space. Even if there are no child nodes, the node would have to record "NULL" for the children and even "NULL" takes space. The problem is exacerbated in an "XML-database-on-top-of-a-relational-database" scenario whereby all these relationships take tons of rubble (multitudes of tables) to express with any hope of runtime performance benefit. All-in-all its back to one of the fundamental principles in computer science. (Memory) space and (execution) time are generally inversely related: If you want to use the smallest amount of space for data storage you "zip it up" but then it will take a long time to find a your data in the compressed file. If you want to access your data in the smallest amount of time, you "expand it out" and use whatever amount of memory you can to get the best time performance out of your algorithms. I wonder how much space your 70MB log file takes when zipped up? Betcha there's lots of redundancy in the information and the compression ratio will be high. Storing the data in an XML database simply takes the ratio in the other direction :-) It is reasonable to expect a decent (execution time) performance benefit though in accessing/navigating the data. If there wasn't this benefit (amongst others) the tradeoff would not be worth it. Trust this rather wordy explanation helps. Cheers Justin Johansson btw. So just what is the zipped up compression ratio for your log file? Dave Stav wrote: > Hi List Members, > > I noticed that sedna database takes quite a lot of disk space, compared to the data it contains, and I was wondering why it is like that. > > I am converting a 70MB log file to an xml file which takes up 144 MB. > After loading this xml file to a newly created sedna database, I can see that the database directory takes up 450MB. > > Does anyone know why this is happening and/or if there is anything we can do to reduce the disk usage? > > Thanks! > > - Dave > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Sedna-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/sedna-discussion
