Hi Martin,
Thank you for the explanation! Now I will probably write a stx script for > changing the structure of the data. > You are welcome. It would be great to try that. Let me know about the results on the modified data. For example of what is possible, here some statistics for the wikixmldb.org demo: - Single source file 34GB; - Sedna data file ~150GB including three value indexes; - sSze of descriptive schema: ~30000 Factor is still very high but database works pretty well. We are permanently thinking on how to decrese it without sacrificing performance. As for concatenation into one document, since documents stored in a > collection have a common descriptive schema, I suppose that the data will > be physically stored almost in the same way as before, therefore there are > no negative side effects of this reorganization (by negative I mean > something not possible or significantly slower). Am I right? > Mostly. The difference is possible in size of nodes. A unique label is assigned to each node of the XML document loaded. The labels encode information about the relative position of the node in the document. The main purpose of this mechanism is to quickly determine the structural relationship between a pair of nodes (ancestor, descendant, etc). Size of these labels during bulk is optimal if loader may analyze the whole data in advance (for example, it counts how many nodes there are for each schema node). Moreover, I think you may continue generating files as you did that before (article per file). Then a very simple script can automatically concatenate them into one file. BTW, don't forget to increase buffers number when the data be ready to load. Ivan Shcheklein, Sedna Team
------------------------------------------------------------------------------
_______________________________________________ Sedna-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/sedna-discussion
