Hi,

I'm using jackrabbit 2.4.2 and facing performance problems. I know write 
performance is a huge disadvantage of jackrabbit, cause it has to be done all 
single threaded.

The situation is that I want to migrate data from some old software and put it 
into a jackrabbit repository with a bundled database manager (clustered 
environment, versionable nodetype). The process is to get some (maybe 500) old 
data sets and then create nodes for them and save the session, then repeat with 
the next data block.

The session.save() is slowing all the process down, especially the database 
operations. I've made a trace of the sql statements and here is what I got:

The following steps are repeated very often (I guess for each new node):

-       Selects at versioning bundle

-       Update at global revision (committed with the journal insert)

-       Inserts and Updates at versioning bundle (committed)

-       Selects at workspace bundle

-       Insert at journal (committed)

-       Update at local revisions (auto-committed)

After this, all of the workspace bundles are saved at once, in one database 
transaction.

Two points:


1.    Versioning

Most of my nodes exist in just one version (about 90%), but because some of 
them are versioned, I need the versionable nodetype. But for all the others a 
version history is created, consuming database space and write performance. Why 
can't the version history not be created if a node is checked in? This would 
save space and time, if a node is versionable but not actually versioned. Or is 
there a solution for a situation like this?

I've also done some testing with multiple sessions and multithreading. In the 
result all but one thread was waiting for the exclusive read/write lock of the 
version manager - so no multithreading possible, as expected.

2.    Operations for each node
The write performance can be fastened by 4 or 5 times, if the operations are 
more bundled in transactions like the inserting/updating of the workspace 
bundles, reducing the commits to a minimum. Storing the workspace bundles takes 
nearly the same time as storing the versioning information for one node (one 
cycle). The updates of global revision and local revisions can be done once and 
not once per changed node reducing the necessary time to a minimum.

I'm going to solve my performance problems now with multiple repositories and 
data splitting...

Regards, Robert

Mit freundlichen Grüßen

i. A. Robert Seidel, Software Infrastructure, Senior Professional
--
AEB GmbH
D-23552 Lübeck, Kanalstraße 62-64
Tel. +49-451-2928938-130
Fax +49-451-2928938-333
[email protected]<mailto:[email protected]>
www.aeb.de<http://www.aeb.de>
---
AEB Gesellschaft zur Entwicklung von Branchen-Software mbH
Stammsitz Stuttgart
Registergericht: Amtsgericht Stuttgart, HRB 84 31
Gerichtsstand Stuttgart
Geschäftsführer: Jochen Günzel, Markus Meißner

Reply via email to