Re: [Neo] neo-rdf-sail + BSBM

Andy Seaborne Mon, 21 Dec 2009 06:08:40 -0800

Hi Mattias,

I tried the DenseTripleStore as well but that crashes (see below for the 
stacktrace).


All this is for version org="org.neo4j" name="neo-rdf-sail" 
rev="0.5-SNAPSHOT" (I use Ivy).

The DenseTripleStore wouldn't help me anyway as I need named graph 
support and reading the wiki, it would seem I need the VerboseQuadStore 
is the only way to get that.  BSBM does not use named graphs but I would 
need that support for other applications.

Going direct to neo-rdf doesn't get me SPARQL does it?  Nor an RDF 
parser?  I'm loading data from an N-triples file.

Is the bottleneck the Lucene index?  I've found Lucene slow to index in 
other uses so it's going to be hard to get close to native store speeds.

(what is that B-Tree code the codebase?)

I'll when the bulk loader when it's RDF aware.

        Thanks,
        Andy

Loading BSBM 250K, DenseTripleStore

java.lang.IllegalArgumentException: Start node equals end node
         at 
org.neo4j.impl.core.RelationshipImpl.<init>(RelationshipImpl.java:58)
         at 
org.neo4j.impl.core.NodeManager.createRelationship(NodeManager.java:293)
         at 
org.neo4j.impl.core.NodeImpl.createRelationshipTo(NodeImpl.java:357)
         at 
org.neo4j.impl.core.NodeProxy.createRelationshipTo(NodeProxy.java:177)
         at 
org.neo4j.rdf.store.representation.standard.UriBasedExecutor.ensureConnected(UriBasedExecutor.java:262)
         at 
org.neo4j.rdf.store.representation.standard.UriBasedExecutor.addToNodeSpace(UriBasedExecutor.java:89)
         at 
org.neo4j.rdf.store.RdfStoreImpl.addStatement(RdfStoreImpl.java:79)
         at 
org.neo4j.rdf.store.RdfStoreImpl.addStatements(RdfStoreImpl.java:59)
         at 
org.neo4j.rdf.sail.NeoSailConnection.internalAddStatement(NeoSailConnection.java:625)
         at 
org.neo4j.rdf.sail.NeoSailConnection.innerAddStatement(NeoSailConnection.java:442)
         at 
org.neo4j.rdf.sail.NeoSailConnection.addStatement(NeoSailConnection.java:480)
         at 
org.openrdf.repository.sail.SailRepositoryConnection.addWithoutCommit(SailRepositoryConnection.java:235)
         at 
org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:405)
         at 
org.openrdf.repository.util.RDFInserter.handleStatement(RDFInserter.java:196)
         at 
org.openrdf.rio.ntriples.NTriplesParser.parseTriple(NTriplesParser.java:260)
         at 
org.openrdf.rio.ntriples.NTriplesParser.parse(NTriplesParser.java:170)
         at 
org.openrdf.rio.ntriples.NTriplesParser.parse(NTriplesParser.java:112)
         at 
org.openrdf.repository.base.RepositoryConnectionBase.addInputStreamOrReader(RepositoryConnectionBase.java:303
)
         at 
org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:253)
         at run.LoadData.main(LoadData.java:80)

On 21/12/2009 11:47, Mattias Persson wrote:
> Hi Andy,
>
> Great to hear you trying out neo4j and the neo-rdf-sail component.
> We're aware that the bulk-insert performance isn't what it should be
> and some of the performance caveats is in the neo-rdf-sail component,
> which is a layer around the neo-rdf component. So if you could try to
> go directly towards neo-rdf you could gain some performance there.
>
> The next step however would be to use the BatchInserter, a NeoService
> for bulk-inserts. See http://wiki.neo4j.org/content/Batch_Insert for
> more info. But since that's another interface we'll have to make some
> adjustments for the neo-rdf component to be friends with it.
>
> I'll put some time into this in the intermediate days between
> Christmas and New Years Day and see how we can make neo-rdf(-sail) do
> a performance leap for bulk-inserts.
>
> Happy Holidays!
>
> / Mattias
>
> 2009/12/21 Andy Seaborne<andy.seabo...@talis.com>:
>> I'm trying to get neo-rdf-sail to run through the Berlin SPARQL
>> Benchmark [1].
>>
>> It's taking about 21 mins to load 1e6 triples for data and 115 mins for
>> 5 million triples.  This is a bit slow - projecting from that, 100M is
>> at least 30 hours.
>>
>> This on EC2 m1.large, ubuntu server, Java heap size 6G, nothing else
>> running, using IcedTea - this is my fixed setup for BSBM.
>>
>> My initial sense is that it is the indexing that is the significant cost
>> but this is just an educated guess at preent. I'm using the
>> LuceneIndexService as per the example. The NeoIndexService is marked not
>> ready for general usage.
>>
>> Any tips for optimizing performance?  I don't need transactionality, for
>> example, because it's a one-time bulk load.
>>
>> I see also component sparql-engine-neo which is based on the
>> leaving.name SPARQL engine (and parts of Sesame 1?). Would this be  better?
>>
>>      Andy
>>
>> [1] http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/V5/
>> _______________________________________________
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
>
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] neo-rdf-sail + BSBM

Reply via email to