Re: [Virtuoso-users] Performance problems while uploading data

Hugh Williams Fri, 20 Jun 2008 04:41:22 -0700

Hi Jan,

We are looking into these performance issue you report and shallrespond back to you on them in a while.

With regards to the server crash you report when performing a SPARQLinsert query, was anything written to the Virtuoso Server log(virtuoso.log) at the time of the crash and was a core file createdas a result ? If you can provide a test case for reproducing thisproblem we would be keen to reproduce in house.


Best Regards
Hugh Williams
Professional Services
OpenLink Software

On 20 Jun 2008, at 12:12, Jan Stette wrote:

Hi all,
I've noticed a few performance problems while using the VirtuosoJDBC driver (virtjdbc3.jar, from the Open Source release version5.0.6) to upload triples to a Virtuoso server.
First of all, I've been trying to do batch queries usingStatement.addBatch() and Statement.executeBatch(). While thisexecutes OK, it appears that the driver isn't actually batching upthe queries to the database. It looks as if these queries areexecuted one by one, with a round trip for each (the code inVirtuosoStatement.executeBatch() andVirtuosoResultSet.process_result() seems to confirm this). Thismakes batch transactions no different from executing individualqueries, which isn't very efficient for large transations.
Another issue is that the driver is very slow while addingstatements to a batch using Statement.addBatch(). Profiling ourtest application, we've seen >90% of the time spend inside a singlemethod: openlink.util.Vector.ensureCapacityHelper(int). Looking atthe code for this, it's a bit strange: this appears to be a copy ofthe standard java.lang.Vector class with some changes. Inparticular, ensureCapacityHelper(int) now reallocates and copiesthe Vector content every time it's called. And this method iscalled every time something is added to a Vector! The end result,especially dealing with large batches, is very, very slow,basically O(N^2) where N is the number of items added to the array.Also, in VirtuosoStatement.addBatch(), it creates a Vector for thebatch passing in an increment size of 10. This means that even ifthe above problem in the Vector class is fixed, it will re-allocatethe Vector every 10 additions. Just using the default value of 0for this is much better, as the Vector will then double itsallocated size, hence there are only log2(N) reallocations of thevector content.We've patched our driver to work around these Vector problems, butit would be nice to get a proper fix for this into thedistribution. It would be interesting to know why this Vectorclass is used in the driver instead of the standardjava.lang.Vector one anyway, the original does seem a lot better...
When not doing batch updates but just individual SPARQL insertqueries we hit another problem: now the bottleneck appears to bethe server.First of all, doing a single SPARQL insert query with a very largenumber of triples in it causes the server to segfault. We realiseit's not necessarily reasonable to do such huge queries (~ 100,000triples in a single query :-), but you probably want to return anerror instead of crashing!
Doing more reasonably sized queries, we find that it takes ~ 70seconds to insert 100,000 triples via SPARQL insert statements.This was done using 100 queries containing 1000 triples each. Thisrate is quite a lot lower than the bulk load road as seen whenusing the ttlp() stored procedure, which gives us rates of about 14seconds per 100,000 triples. Is there any way we can getapproximately the same performance while doing bulk inserts, forexample by disabling indexes while we're doing the upload?
Regards,
Jan
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Re: [Virtuoso-users] Performance problems while uploading data

Reply via email to