Hi Martin,


Thank you for the explanation! Now I will probably write a stx script for
> changing the structure of the data.
>


You are welcome. It would be great to try that. Let me know about the
results on the modified data. For example of what is possible, here some
statistics for the wikixmldb.org demo:

   - Single source file 34GB;
   - Sedna data file ~150GB including three value indexes;
   - sSze of descriptive schema: ~30000

Factor is still very high but database works pretty well. We are permanently
thinking on how to decrese it without sacrificing performance.


As for concatenation into one document, since documents stored in a
> collection have  a common descriptive schema, I suppose that the data will
> be physically stored almost in the same way as before, therefore there are
> no negative side effects of this reorganization (by negative I mean
> something not possible or significantly slower). Am I right?
>


Mostly. The difference is possible in size of nodes. A unique label is
assigned to each node of the XML document loaded. The labels encode
information about the relative position of the node in the document. The
main purpose of this mechanism is to quickly determine the structural
relationship between a pair of nodes (ancestor, descendant, etc). Size of
these labels during bulk is optimal if loader may analyze the whole data in
advance (for example, it counts how many nodes there are for each schema
node).

Moreover, I think you may continue generating files as you did that before
(article per file).  Then a very simple script can automatically concatenate
them into one file.

BTW, don't forget to increase buffers number when the data be ready to load.



Ivan Shcheklein,
Sedna Team
------------------------------------------------------------------------------

_______________________________________________
Sedna-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sedna-discussion

Reply via email to