Hi, I'm using jackrabbit 2.4.2 and facing performance problems. I know write performance is a huge disadvantage of jackrabbit, cause it has to be done all single threaded.
The situation is that I want to migrate data from some old software and put it into a jackrabbit repository with a bundled database manager (clustered environment, versionable nodetype). The process is to get some (maybe 500) old data sets and then create nodes for them and save the session, then repeat with the next data block. The session.save() is slowing all the process down, especially the database operations. I've made a trace of the sql statements and here is what I got: The following steps are repeated very often (I guess for each new node): - Selects at versioning bundle - Update at global revision (committed with the journal insert) - Inserts and Updates at versioning bundle (committed) - Selects at workspace bundle - Insert at journal (committed) - Update at local revisions (auto-committed) After this, all of the workspace bundles are saved at once, in one database transaction. Two points: 1. Versioning Most of my nodes exist in just one version (about 90%), but because some of them are versioned, I need the versionable nodetype. But for all the others a version history is created, consuming database space and write performance. Why can't the version history not be created if a node is checked in? This would save space and time, if a node is versionable but not actually versioned. Or is there a solution for a situation like this? I've also done some testing with multiple sessions and multithreading. In the result all but one thread was waiting for the exclusive read/write lock of the version manager - so no multithreading possible, as expected. 2. Operations for each node The write performance can be fastened by 4 or 5 times, if the operations are more bundled in transactions like the inserting/updating of the workspace bundles, reducing the commits to a minimum. Storing the workspace bundles takes nearly the same time as storing the versioning information for one node (one cycle). The updates of global revision and local revisions can be done once and not once per changed node reducing the necessary time to a minimum. I'm going to solve my performance problems now with multiple repositories and data splitting... Regards, Robert Mit freundlichen Grüßen i. A. Robert Seidel, Software Infrastructure, Senior Professional -- AEB GmbH D-23552 Lübeck, Kanalstraße 62-64 Tel. +49-451-2928938-130 Fax +49-451-2928938-333 [email protected]<mailto:[email protected]> www.aeb.de<http://www.aeb.de> --- AEB Gesellschaft zur Entwicklung von Branchen-Software mbH Stammsitz Stuttgart Registergericht: Amtsgericht Stuttgart, HRB 84 31 Gerichtsstand Stuttgart Geschäftsführer: Jochen Günzel, Markus Meißner
