Hi Dave,


> My plan is to convert and import a total of about 50GB of log files to
> sedna. Do you think that the ratio will be the same? i.e. 50GB of log files
> will turn to 109GB of xml which will be saved as 340GB?
>


Yes, it's likely the ratio will be approximatelly the same. It strictly
depends on the structure of your data. Can you give us an example of the XML
you want to load?


BTW, do you have any recommendation as to the way the data is saved? I am
> considering separate databases or separate documents, one document with many
> sub-nodes. From your explanation I understand that the more the data is
> divided into nodes the more disk space it will require, so perhaps I'm
> better off separating the data into several documents.
>


Usually it's not a good idea to to use many databases. For example, you
won't be able to query them simultaneously. Do you have one big document or
many small documents? You will have approximatelly the same result either
you load your data as one big document into database or as several documents
into collection (
http://modis.ispras.ru/sedna/progguide/ProgGuidesu8.html#x14-470002.5.2 ).

I am also concerned about performance. Has a 340GB database ever been tried
> on sedna to your knowledge?
>


Sure. We had experience with 500-600GB databases. BTW, WikiXMLDB demo has
130GB database.


Ivan Shcheklein,
Sedna Team


> Thanks for your help!
>
>  - Dave
>
> On Wed, Sep 09, 2009 at 05:56:29AM +0930, Justin Johansson wrote:
> > Hi Dave,
> >
> > You will find that this issue is not confined to only Sedna but rather
> > most XML databases whether they be "native XML databases" or implemented
> > over a relational DB such as it MonetDB.  Except if your application is
> > running in a very disk space limited environment (such as a mobile
> > device), disk space these days is so cheap that it's not really an issue
> > to worry too much about.  Having said that I'll try to explain why it is
> > like that.
> >
> > Going from 70MB log file (presumably plain text as variable length log
> > lines) to 144MB in XML format is easily explainable realizing the space
> > that the addition of XML tags take up. (That's not telling you anything
> > new as you seem to appreciate that bit).  Going from XML text to
> > persisting the data in XML database has a storage overhead for analogous
> > reasons that being the addition of XPath-axis relationship information
> > between the nodes in the XML if for no other reason.
> >
> > Think for a moment about how XML-DOM (Document Object Model) is
> > implemented.  (Not saying that Sedna is a implemented as a persistent
> > DOM but it's useful to analyze your issue this way).  For each node in
> > the document, in order for the database to implement XPath navigation
> > efficiently it needs to store "pointers" to parent and ancestor nodes,
> > child nodes, previous sibling and following sibling nodes, list of
> > attribute nodes (in case of element nodes) and so on for all 13 (in
> > number I think) different XPath axes.  This all takes space.  Even if
> > there are no child nodes, the node would have to record "NULL" for the
> > children and even "NULL" takes space.
> >
> > The problem is exacerbated in an
> > "XML-database-on-top-of-a-relational-database" scenario whereby all
> > these relationships take tons of rubble (multitudes of tables) to
> > express with any hope of runtime performance benefit.
> >
> > All-in-all its back to one of the fundamental principles in computer
> > science.  (Memory) space and (execution) time are generally inversely
> > related:  If you want to use the smallest amount of space for data
> > storage you "zip it up" but then it will take a long time to find a your
> > data in the compressed file.  If you want to access your data in the
> > smallest amount of time, you "expand it out" and use whatever amount of
> > memory you can to get the best time performance out of your algorithms.
> >
> > I wonder how much space your 70MB log file takes when zipped up? Betcha
> > there's lots of redundancy in the information and the compression ratio
> > will be high.  Storing the data in an XML database simply takes the
> > ratio in the other direction :-)  It is reasonable to expect a decent
> > (execution time) performance benefit though in accessing/navigating the
> > data.  If there wasn't this benefit (amongst others) the tradeoff would
> > not be worth it.
> >
> > Trust this rather wordy explanation helps.
> >
> > Cheers
> > Justin Johansson
> >
> > btw.  So just what is the zipped up compression ratio for your log file?
> >
> >
> >
> > Dave Stav wrote:
> > > Hi List Members,
> > >
> > > I noticed that sedna database takes quite a lot of disk space,
> > compared to the data it contains, and I was wondering why it is like
> > that.
> > >
> > > I am converting a 70MB log file to an xml file which takes up 144 MB.
> > > After loading this xml file to a newly created sedna database, I can
> > see that the database directory takes up 450MB.
> > >
> > > Does anyone know why this is happening and/or if there is anything we
> > can do to reduce the disk usage?
> > >
> > > Thanks!
> > >
> > >  - Dave
> > >
> >
> >
>
> --
>  EE 77 7F 30 4A 64 2E C5  83 5F E7 49 A6 82 29 BA    ~. .~   Tk Open
> Systems
>
>  =}-----------------------------------------------ooO--U--Ooo-------------{=
>      - [email protected] - tel: +972.2.679.5364, http://www.tkos.co.il -
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
> trial. Simplify your report design, integration and deployment - and focus
> on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Sedna-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/sedna-discussion
>
The 66MB log file is gzipped to 5.5MB (same is in zip). This was a test log
file.
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Sedna-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sedna-discussion

Reply via email to