Re: [Neo] neo-rdf-sail + BSBM

Andy Seaborne Mon, 21 Dec 2009 07:37:00 -0800


On 21/12/2009 14:39, Mattias Persson wrote:
> 2009/12/21 Andy Seaborne<andy.seabo...@talis.com>:
>> Hi Mattias,
>>
>> I tried the DenseTripleStore as well but that crashes (see below for the
>> stacktrace).
>>
>> All this is for version org="org.neo4j" name="neo-rdf-sail"
>> rev="0.5-SNAPSHOT" (I use Ivy).
>>
>> The DenseTripleStore wouldn't help me anyway as I need named graph
>> support and reading the wiki, it would seem I need the VerboseQuadStore
>> is the only way to get that.  BSBM does not use named graphs but I would
>> need that support for other applications.
> Yep DenseTripleStore doesn't support named graphs so VerboseQuadStore
> is the way to go there.
>>
>> Going direct to neo-rdf doesn't get me SPARQL does it?  Nor an RDF
>> parser?  I'm loading data from an N-triples file.
> True, you won't be able to look at it as a Sail w/o the neo-rdf-sail
> component, if you don't wrap it yourself in a lean-n-mean manner. The
> neo-rdf-sail component should be a very thin layer on top of neo-rdf,
> but got a little thicker since it supports suspending/resuming
> transactions so that one thread can manage many concurrent
> transactions. This makes its performance less than optimal and there's
> certainly room for improvement there. It should however be rather easy
> to just wrap an RdfStore (f.ex. a VerboseQuadStore) in a Sail, if
> you'd like to try it out yourself!


I was more hoping for a Jena wrapper :-)

I only have one thread so the overhead should be low.  It does look to 
be disk bound.

>> Is the bottleneck the Lucene index?  I've found Lucene slow to index in
>> other uses so it's going to be hard to get close to native store speeds.
> Hard to say, it certainly affects the insertion speed quite a bit.
>>
>> (what is that B-Tree code the codebase?)
> I'm sorry, I don't understand this question

I see org.neo4j.util.btree :-)

>>
>> I'll when the bulk loader when it's RDF aware.
>
> Keep in mind that the DenseTripleStore isn't as battle tested as
> VerboseQuadStore so the exception below probably just points that out,
> so to speak. I'll look into that as well!

Thanks - I need to move on to the SPARQL queries on small datasets and 
will await the batch loader support.

        Andy

>
>>
>>         Thanks,
>>         Andy
>>
>> Loading BSBM 250K, DenseTripleStore
>>
>> java.lang.IllegalArgumentException: Start node equals end node
>>          at
>> org.neo4j.impl.core.RelationshipImpl.<init>(RelationshipImpl.java:58)
>>          at
>> org.neo4j.impl.core.NodeManager.createRelationship(NodeManager.java:293)
>>          at
>> org.neo4j.impl.core.NodeImpl.createRelationshipTo(NodeImpl.java:357)
>>          at
>> org.neo4j.impl.core.NodeProxy.createRelationshipTo(NodeProxy.java:177)
>>          at
>> org.neo4j.rdf.store.representation.standard.UriBasedExecutor.ensureConnected(UriBasedExecutor.java:262)
>>          at
>> org.neo4j.rdf.store.representation.standard.UriBasedExecutor.addToNodeSpace(UriBasedExecutor.java:89)
>>          at
>> org.neo4j.rdf.store.RdfStoreImpl.addStatement(RdfStoreImpl.java:79)
>>          at
>> org.neo4j.rdf.store.RdfStoreImpl.addStatements(RdfStoreImpl.java:59)
>>          at
>> org.neo4j.rdf.sail.NeoSailConnection.internalAddStatement(NeoSailConnection.java:625)
>>          at
>> org.neo4j.rdf.sail.NeoSailConnection.innerAddStatement(NeoSailConnection.java:442)
>>          at
>> org.neo4j.rdf.sail.NeoSailConnection.addStatement(NeoSailConnection.java:480)
>>          at
>> org.openrdf.repository.sail.SailRepositoryConnection.addWithoutCommit(SailRepositoryConnection.java:235)
>>          at
>> org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:405)
>>          at
>> org.openrdf.repository.util.RDFInserter.handleStatement(RDFInserter.java:196)
>>          at
>> org.openrdf.rio.ntriples.NTriplesParser.parseTriple(NTriplesParser.java:260)
>>          at
>> org.openrdf.rio.ntriples.NTriplesParser.parse(NTriplesParser.java:170)
>>          at
>> org.openrdf.rio.ntriples.NTriplesParser.parse(NTriplesParser.java:112)
>>          at
>> org.openrdf.repository.base.RepositoryConnectionBase.addInputStreamOrReader(RepositoryConnectionBase.java:303
>> )
>>          at
>> org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:253)
>>          at run.LoadData.main(LoadData.java:80)
>>
>> On 21/12/2009 11:47, Mattias Persson wrote:
>>> Hi Andy,
>>>
>>> Great to hear you trying out neo4j and the neo-rdf-sail component.
>>> We're aware that the bulk-insert performance isn't what it should be
>>> and some of the performance caveats is in the neo-rdf-sail component,
>>> which is a layer around the neo-rdf component. So if you could try to
>>> go directly towards neo-rdf you could gain some performance there.
>>>
>>> The next step however would be to use the BatchInserter, a NeoService
>>> for bulk-inserts. See http://wiki.neo4j.org/content/Batch_Insert for
>>> more info. But since that's another interface we'll have to make some
>>> adjustments for the neo-rdf component to be friends with it.
>>>
>>> I'll put some time into this in the intermediate days between
>>> Christmas and New Years Day and see how we can make neo-rdf(-sail) do
>>> a performance leap for bulk-inserts.
>>>
>>> Happy Holidays!
>>>
>>> / Mattias
>>>
>>> 2009/12/21 Andy Seaborne<andy.seabo...@talis.com>:
>>>> I'm trying to get neo-rdf-sail to run through the Berlin SPARQL
>>>> Benchmark [1].
>>>>
>>>> It's taking about 21 mins to load 1e6 triples for data and 115 mins for
>>>> 5 million triples.  This is a bit slow - projecting from that, 100M is
>>>> at least 30 hours.
>>>>
>>>> This on EC2 m1.large, ubuntu server, Java heap size 6G, nothing else
>>>> running, using IcedTea - this is my fixed setup for BSBM.
>>>>
>>>> My initial sense is that it is the indexing that is the significant cost
>>>> but this is just an educated guess at preent. I'm using the
>>>> LuceneIndexService as per the example. The NeoIndexService is marked not
>>>> ready for general usage.
>>>>
>>>> Any tips for optimizing performance?  I don't need transactionality, for
>>>> example, because it's a one-time bulk load.
>>>>
>>>> I see also component sparql-engine-neo which is based on the
>>>> leaving.name SPARQL engine (and parts of Sesame 1?). Would this be  better?
>>>>
>>>>       Andy
>>>>
>>>> [1] http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/V5/
>>>> _______________________________________________
>>>> Neo mailing list
>>>> User@lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>
>>>
>>>
>>>
>> _______________________________________________
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
>
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] neo-rdf-sail + BSBM

Reply via email to