Hi Martin, It's strange behaviour. Could you share your data (or just part of it) to try to reproduce this issue on our side? Anyway, try to run your transaction in the logless mode:
*LOG_LESS_MODE (\ll for set, \fl for unset) - when set, every bulkload will be less logged. When unset, every bulkload will be fully logged. By default transactions are run in full log mode. *For example: \nac \ll LOAD "/var/pages/000/1000.xml" "000-1000.xml" "inex" & LOAD "/var/pages/000/1009000.xml" "000-1009000.xml" "inex" & ... and so on ... \commit Ivan Shcheklein, Sedna Team On Sat, May 8, 2010 at 3:45 AM, Martin Bukatovic <[email protected] > wrote: > Hi > > I'm trying to load "large" (13 GB) set of XML documents into Sedna. > I'm aware of the fact that such large pile of files requires lots of > space, but the actual requirements for free space wildly surpassed my > expectations (and yes, I have read this thread > http://article.gmane.org/gmane.text.xml.sedna/1749). So I'm going to > provide you with some information in hope that you can point out some > errors of mine or just assure me that this is expected behaviour. > > The data consists of large number of small files (1 - 400 KB) > representing articles so it is similar to the wikixmldb demo. I set up > a dedicated database (in se_cdb sense) with just one collection for > that files. Some shell script then creates a list of files to load: > > \nac > LOAD "/var/pages/000/1000.xml" "000-1000.xml" "inex" & > LOAD "/var/pages/000/1009000.xml" "000-1009000.xml" "inex" & > ... and so on ... > \commit > > This list is then executed by se_term. The problem is that even for a > 3GB fraction of the data, free space of 52 GB is not enough. Now I'm > half way through loading of 300MB portion and the database directory > already occupies 35GB (I have deleted the database after each > experiment, so there are no remnants of older data) - this is quite a > shock for me :) > > The text in the articles has a lots of markup (see example of just one > word bellow), which I suspect to have an effect on resulting size: > > <region wordnetid="108630985" confidence="0.8"> > <administrative_district wordnetid="108491826" confidence="0.8"> > <location wordnetid="100027167" confidence="0.8"> > <commune wordnetid="108541609" confidence="0.8"> > <district wordnetid="108552138" confidence="0.8"> > <link xlink:type="simple" xlink:href="../436/2166436.xml"> > Berville-sur-Mer</link> > </district> > </commune> > </location> > </administrative_district> > </region> > > So, do you think that 300 MB -> 35 GB (and counting) transformation is > expected here? > > Thanks a lot. > > Martin B. > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Sedna-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/sedna-discussion >
------------------------------------------------------------------------------
_______________________________________________ Sedna-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/sedna-discussion
