Fot those using EE, there is also xml splitter in the magnolia-tools which does 
the split and you can use the restore jsp to import all the split xml files 
from the directory in the right order (from the top pages to the ones at the 
bottom of the tree).

There is no "right" JackRabbit way. JR will validate all the nodes in the 
session when saving the repo. Saving happens at the end of the import. That is 
it. There have been discussions in the past about the problem in the JackRabbit 
user forum. Recommendation is to use file sizes so big/small that you are sure 
they will fit into the repo comfortably.
Since Magnolia in most cases (except forum) doesn't use references 
(jcr:reference, not saving the node UUID like Magnolia does in link control), 
it is save to split the files and store them separately (unless of course you 
introduce references in custom controls). If there are references, no matter 
how far the the referenced node is in the tree from the referencing node, the 
referenced one needs to be imported before or latest at the same time as the 
referenced node.

HTH,
Jan

On Dec 13, 2010, at 2:11 PM, Ernst Bunders wrote:

> I just created a groovy script that helps me to split up large xml exports 
> (we find exporting to xml is far less problematic than importing from). I 
> hereby attach the script, for posterity and those deplorable lost souls who 
> find them selfs in equally dire straits.
> 
> But I would still like to have more information about dealing with this 
> problem 'the right way', which would be enabling jackrabbit to import these 
> files. Anybody????
> 
> regards,
> 
> Ernst
> 
> 2010/12/13 Will Scheidegger <[email protected]>
> I did stuff like this with BBEdit (on a Mac). But opening the file took about 
> 10 minutes ;-)
> Good luck!
> 
> -will
> 
> On 13.12.2010, at 11:09, Ernst Bunders wrote:
> 
>> 
>> 
>> 2010/12/13 Will Scheidegger <[email protected]>
>> Hm... very interested in a solution too as we see this happening
>> - on imports
>> - on exports
>> - on deletion
>> of massive nodes.
>> 
>> For starters if you don't have all article in a single-level structure (i.e. 
>> if you have folders and subfolders) you can try to open the XML in a good 
>> text editor and split it up into several chunks which can then be partially 
>> imported.
>> 
>> Well, we don't have them in a single level structure, and what you propose 
>> is exactly what I'm doing just now. Currently I am trying Jedit, which gave 
>> me good results in the past, but the same problem seems to occur here. 
>> Perhaps there are streaming xml editors? I don't know. 
>> 
>> Is there an editor in specific that you would like to recommend?
>> 
>> regards,
>> 
>> Ernst
>>  
>> 
>> -will
>> 
>> 
>> On 13.12.2010, at 10:44, Ernst Bunders wrote:
>> 
>>> Hello
>>> 
>>> We are in the final stages of (re)creating one of our largest sites in 
>>> magnolia (http://geschiedenis.vpro.nl). This site has a massive archive of 
>>> articles and dossiers which we all diligently imported in the new site. 
>>> 
>>> So, the final step is to export the data from our test environment and re 
>>> import it on the acceptance server. This is where trouble hits. I have a 
>>> file website.geschiedenis.xml that is quite large. it's size is 292Mb, and
>>> cat website.geschiedenis.xml | grep "<sv:node" | wc -l
>>> yields: 289648. Quite a lot of nodes.
>>> 
>>> The problem is that we can not import this file anymore. With max heap 
>>> space settings of 2Gb we still get OutOfMemory exceptions.
>>> 
>>> Apparently this whole xml tree is being constructed in memory, in a 
>>> not-so-efficient way (lots of overhead if you can't fit 292 Mb of data in a 
>>> 2Gb tree)
>>> 
>>> We do not have a lot of knowledge about this subject yet. We don't know if 
>>> and how Jackrabbit can be tuned to deal with different scenario's. So any 
>>> help would be appreciated. This is clearly a problem we need to get out of 
>>> the way, also because it impedes our possibilities for backup and restore.
>>> 
>>> For now I am going to try to cut the file up into different smaller ones, 
>>> see how that will go.
>>> 
>>> regards,
>>> 
>>> Ernst
>>> 
>>> -- 
>>> Ernst Bunders
>>> Ontwikkelaar VPRO
>> 
>> 
>> 
>> ----------------------------------------------------------------
>> For list details see
>> http://www.magnolia-cms.com/home/community/mailing-lists.html
>> To unsubscribe, E-mail to: <[email protected]>
>> ----------------------------------------------------------------
>> 
>> 
>> 
>> -- 
>> Ernst Bunders
>> Ontwikkelaar VPRO
> 
> 
> 
> ----------------------------------------------------------------
> For list details see
> http://www.magnolia-cms.com/home/community/mailing-lists.html
> To unsubscribe, E-mail to: <[email protected]>
> ----------------------------------------------------------------
> 
> 
> 
> -- 
> Ernst Bunders
> Ontwikkelaar VPRO
> <splitXml.groovy>




----------------------------------------------------------------
For list details see
http://www.magnolia-cms.com/home/community/mailing-lists.html
To unsubscribe, E-mail to: <[email protected]>
----------------------------------------------------------------

Reply via email to