On 21/12/2009 14:39, Mattias Persson wrote: > 2009/12/21 Andy Seaborne<andy.seabo...@talis.com>: >> Hi Mattias, >> >> I tried the DenseTripleStore as well but that crashes (see below for the >> stacktrace). >> >> All this is for version org="org.neo4j" name="neo-rdf-sail" >> rev="0.5-SNAPSHOT" (I use Ivy). >> >> The DenseTripleStore wouldn't help me anyway as I need named graph >> support and reading the wiki, it would seem I need the VerboseQuadStore >> is the only way to get that. BSBM does not use named graphs but I would >> need that support for other applications. > Yep DenseTripleStore doesn't support named graphs so VerboseQuadStore > is the way to go there. >> >> Going direct to neo-rdf doesn't get me SPARQL does it? Nor an RDF >> parser? I'm loading data from an N-triples file. > True, you won't be able to look at it as a Sail w/o the neo-rdf-sail > component, if you don't wrap it yourself in a lean-n-mean manner. The > neo-rdf-sail component should be a very thin layer on top of neo-rdf, > but got a little thicker since it supports suspending/resuming > transactions so that one thread can manage many concurrent > transactions. This makes its performance less than optimal and there's > certainly room for improvement there. It should however be rather easy > to just wrap an RdfStore (f.ex. a VerboseQuadStore) in a Sail, if > you'd like to try it out yourself!
I was more hoping for a Jena wrapper :-) I only have one thread so the overhead should be low. It does look to be disk bound. >> Is the bottleneck the Lucene index? I've found Lucene slow to index in >> other uses so it's going to be hard to get close to native store speeds. > Hard to say, it certainly affects the insertion speed quite a bit. >> >> (what is that B-Tree code the codebase?) > I'm sorry, I don't understand this question I see org.neo4j.util.btree :-) >> >> I'll when the bulk loader when it's RDF aware. > > Keep in mind that the DenseTripleStore isn't as battle tested as > VerboseQuadStore so the exception below probably just points that out, > so to speak. I'll look into that as well! Thanks - I need to move on to the SPARQL queries on small datasets and will await the batch loader support. Andy > >> >> Thanks, >> Andy >> >> Loading BSBM 250K, DenseTripleStore >> >> java.lang.IllegalArgumentException: Start node equals end node >> at >> org.neo4j.impl.core.RelationshipImpl.<init>(RelationshipImpl.java:58) >> at >> org.neo4j.impl.core.NodeManager.createRelationship(NodeManager.java:293) >> at >> org.neo4j.impl.core.NodeImpl.createRelationshipTo(NodeImpl.java:357) >> at >> org.neo4j.impl.core.NodeProxy.createRelationshipTo(NodeProxy.java:177) >> at >> org.neo4j.rdf.store.representation.standard.UriBasedExecutor.ensureConnected(UriBasedExecutor.java:262) >> at >> org.neo4j.rdf.store.representation.standard.UriBasedExecutor.addToNodeSpace(UriBasedExecutor.java:89) >> at >> org.neo4j.rdf.store.RdfStoreImpl.addStatement(RdfStoreImpl.java:79) >> at >> org.neo4j.rdf.store.RdfStoreImpl.addStatements(RdfStoreImpl.java:59) >> at >> org.neo4j.rdf.sail.NeoSailConnection.internalAddStatement(NeoSailConnection.java:625) >> at >> org.neo4j.rdf.sail.NeoSailConnection.innerAddStatement(NeoSailConnection.java:442) >> at >> org.neo4j.rdf.sail.NeoSailConnection.addStatement(NeoSailConnection.java:480) >> at >> org.openrdf.repository.sail.SailRepositoryConnection.addWithoutCommit(SailRepositoryConnection.java:235) >> at >> org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:405) >> at >> org.openrdf.repository.util.RDFInserter.handleStatement(RDFInserter.java:196) >> at >> org.openrdf.rio.ntriples.NTriplesParser.parseTriple(NTriplesParser.java:260) >> at >> org.openrdf.rio.ntriples.NTriplesParser.parse(NTriplesParser.java:170) >> at >> org.openrdf.rio.ntriples.NTriplesParser.parse(NTriplesParser.java:112) >> at >> org.openrdf.repository.base.RepositoryConnectionBase.addInputStreamOrReader(RepositoryConnectionBase.java:303 >> ) >> at >> org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:253) >> at run.LoadData.main(LoadData.java:80) >> >> On 21/12/2009 11:47, Mattias Persson wrote: >>> Hi Andy, >>> >>> Great to hear you trying out neo4j and the neo-rdf-sail component. >>> We're aware that the bulk-insert performance isn't what it should be >>> and some of the performance caveats is in the neo-rdf-sail component, >>> which is a layer around the neo-rdf component. So if you could try to >>> go directly towards neo-rdf you could gain some performance there. >>> >>> The next step however would be to use the BatchInserter, a NeoService >>> for bulk-inserts. See http://wiki.neo4j.org/content/Batch_Insert for >>> more info. But since that's another interface we'll have to make some >>> adjustments for the neo-rdf component to be friends with it. >>> >>> I'll put some time into this in the intermediate days between >>> Christmas and New Years Day and see how we can make neo-rdf(-sail) do >>> a performance leap for bulk-inserts. >>> >>> Happy Holidays! >>> >>> / Mattias >>> >>> 2009/12/21 Andy Seaborne<andy.seabo...@talis.com>: >>>> I'm trying to get neo-rdf-sail to run through the Berlin SPARQL >>>> Benchmark [1]. >>>> >>>> It's taking about 21 mins to load 1e6 triples for data and 115 mins for >>>> 5 million triples. This is a bit slow - projecting from that, 100M is >>>> at least 30 hours. >>>> >>>> This on EC2 m1.large, ubuntu server, Java heap size 6G, nothing else >>>> running, using IcedTea - this is my fixed setup for BSBM. >>>> >>>> My initial sense is that it is the indexing that is the significant cost >>>> but this is just an educated guess at preent. I'm using the >>>> LuceneIndexService as per the example. The NeoIndexService is marked not >>>> ready for general usage. >>>> >>>> Any tips for optimizing performance? I don't need transactionality, for >>>> example, because it's a one-time bulk load. >>>> >>>> I see also component sparql-engine-neo which is based on the >>>> leaving.name SPARQL engine (and parts of Sesame 1?). Would this be better? >>>> >>>> Andy >>>> >>>> [1] http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/V5/ >>>> _______________________________________________ >>>> Neo mailing list >>>> User@lists.neo4j.org >>>> https://lists.neo4j.org/mailman/listinfo/user >>>> >>> >>> >>> >> _______________________________________________ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > > _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user