Hi Hugh, I haven't got the logs or core dump at hand at the moment, but it should be very easy to reproduce: just send the server a single query like "sparql insert into graph<http://test> { <triple 1> . <triple 2> . <triple 3> . ... }", with 100,000 triples in the query. As I said, I realise this is an unreasonable thing to do, but it makes for an interesting test case anyway I guess!
Let me know if you need more details to reproduce it anyway. Regards, Jan 2008/6/20 Hugh Williams <hwilli...@openlinksw.com>: > Hi Jan, > > We are looking into these performance issue you report and shall respond > back to you on them in a while. > > With regards to the server crash you report when performing a SPARQL insert > query, was anything written to the Virtuoso Server log (virtuoso.log) at the > time of the crash and was a core file created as a result ? If you can > provide a test case for reproducing this problem we would be keen to > reproduce in house. > > Best Regards > Hugh Williams > Professional Services > OpenLink Software > > > On 20 Jun 2008, at 12:12, Jan Stette wrote: > > Hi all, >> >> I've noticed a few performance problems while using the Virtuoso JDBC >> driver (virtjdbc3.jar, from the Open Source release version 5.0.6) to upload >> triples to a Virtuoso server. >> >> First of all, I've been trying to do batch queries using >> Statement.addBatch() and Statement.executeBatch(). While this executes OK, >> it appears that the driver isn't actually batching up the queries to the >> database. It looks as if these queries are executed one by one, with a >> round trip for each (the code in VirtuosoStatement.executeBatch() and >> VirtuosoResultSet.process_result() seems to confirm this). This makes batch >> transactions no different from executing individual queries, which isn't >> very efficient for large transations. >> >> Another issue is that the driver is very slow while adding statements to a >> batch using Statement.addBatch(). Profiling our test application, we've >> seen >90% of the time spend inside a single method: >> openlink.util.Vector.ensureCapacityHelper(int). Looking at the code for >> this, it's a bit strange: this appears to be a copy of the standard >> java.lang.Vector class with some changes. In particular, >> ensureCapacityHelper(int) now reallocates and copies the Vector content >> every time it's called. And this method is called every time something is >> added to a Vector! The end result, especially dealing with large batches, >> is very, very slow, basically O(N^2) where N is the number of items added to >> the array. >> Also, in VirtuosoStatement.addBatch(), it creates a Vector for the batch >> passing in an increment size of 10. This means that even if the above >> problem in the Vector class is fixed, it will re-allocate the Vector every >> 10 additions. Just using the default value of 0 for this is much better, as >> the Vector will then double its allocated size, hence there are only log2(N) >> reallocations of the vector content. >> We've patched our driver to work around these Vector problems, but it >> would be nice to get a proper fix for this into the distribution. It would >> be interesting to know why this Vector class is used in the driver instead >> of the standard java.lang.Vector one anyway, the original does seem a lot >> better... >> >> When not doing batch updates but just individual SPARQL insert queries we >> hit another problem: now the bottleneck appears to be the server. >> First of all, doing a single SPARQL insert query with a very large number >> of triples in it causes the server to segfault. We realise it's not >> necessarily reasonable to do such huge queries (~ 100,000 triples in a >> single query :-), but you probably want to return an error instead of >> crashing! >> >> Doing more reasonably sized queries, we find that it takes ~ 70 seconds to >> insert 100,000 triples via SPARQL insert statements. This was done using >> 100 queries containing 1000 triples each. This rate is quite a lot lower >> than the bulk load road as seen when using the ttlp() stored procedure, >> which gives us rates of about 14 seconds per 100,000 triples. Is there any >> way we can get approximately the same performance while doing bulk inserts, >> for example by disabling indexes while we're doing the upload? >> >> Regards, >> Jan >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/ >> index.php_______________________________________________ >> Virtuoso-users mailing list >> Virtuoso-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/virtuoso-users >> > >