Hi all,
I've upgraded the Jena version that I use with this tool:
https://github.com/Rothamsted/rdf2pg
Now I'm seeing performance problems with the TDB used in read-only
transactions, as explained by the documentation:
https://github.com/Rothamsted/rdf2pg/blob/44f2bd16b27a6f13f447d1070f6abcea45f3d492/rdf2pg-core/src/main/java/uk/ac/rothamsted/kg/rdf2pg/pgmaker/support/rdf/RdfDataManager.java#L153
As you can see, the approach is: begin RO transaction, query, end
transaction, all done in parallel threads (8 to 32, depending on the
underlining system).
Using VisualVM, I see the threads running the code above often go in the
"monitor" state, ie, they wait for a Java synchronized object to be
freed up, most of the time they wait 1-3 seconds for that. While it's
hard to know where exactly this happens, I commented all actions around
and I left the above TDB reading only, and then they block each-other
more often.
Moreover, VisualVM also allows me to see that the threads spend a lot of
time with *org.apache.jena.dboe.transaction.txn.Transaction.end ()*,
drilling down the later, I can see that
*org.apache.jena.dboe.transaction.txn.journal.Journal.sync ()* is the
method consuming most of the time.
I don't understand this: all the operations are read-only, why are they
run into synchronized sections? Why does Jena spend so much time
synchronising the journal? Would abort() make any difference?
My intuition is that even RO operations must ensure that no writing
transaction has changed the TDB, but if that's the case, isn't there a
way to tell that I never write anything anywhere, and hence it shouldn't
waste time with the journal or anything that check for changes?
My rdf2pg tool writes into TDB only during a possible initial stage,
when RDF data are loaded from files, then the TDB is re-opened and a
long conversion stage is run that is entirely read-only. I guess this is
a pretty common behaviour and maybe I'm doing something wrong.
Furthermore, It used to be much faster with past Jena versions (with the
same code): https://github.com/Rothamsted/graphdb-benchmarks#test-results
Thanks in advance for any help,
Marco.