Another problem I see is that you're having too granular transactions
which will slow down the insertion process quite a bit. Try grouping a
couple of thousands operations in one transaction and you'll see a
performance boost!

FYI: I can trigger the problem you were having with lucene "too many
open files" issue. And I'm almost 100% sure that it will be resolved
if you increase the span of your transactions.

2010/1/29 Mattias Persson <matt...@neotechnology.com>:
> I see that you're not using the Neo4j BatchInserter. Your use case
> could definately benefit from it. Take a look at
> http://wiki.neo4j.org/content/Batch_Insert on how to use it. And since
> you're using index lookups you'll have to keep most of the index in
> your own HashMap or other cache (instead of going down to index) for
> maximum performance. (the LuceneIndexBatchInserterImpl doesn't have
> cache built-in, like the LuceneIndexService does).
>
> You typically gain 5 times the speed or more when using the
> BatchInserter instead of GraphDatabaseService for inserting big data
> sets.
>
> 2010/1/28 Mattias Persson <matt...@neotechnology.com>:
>> Im sorry but I think your attachment got caught in our mail filters.
>>
>> Could you perhaps send me your project (including the entire data file
>> via some file sending service, f.ex http://sprend.com) and send it to
>> me directly, matt...@neotechnology.com?
>>
>> 2010/1/28, Symeon (Akis) Papadopoulos <papa...@iti.gr>:
>>> ....
>>>> Great, lucene handles merging in the background automatically if the
>>>> files are too sharded. So this error shouldn't occur unless there are
>>>> some corner case where some IndexReader/IndexWriter isn't closed
>>>> properly... so that's why I'm suspecting a bug here :)
>>>>
>>>> Great that you found a work-around, but I suspect it could happen even
>>>> with a higher ulimit though.
>>>>
>>>>> The project I'm running is rather large, but at some point I will try to
>>>>> prepare a script in order to replicate the error.
>>>>>
>>>> That would be great indeed!
>>>>
>>> I attach an eclipse project which is an extract from the larger
>>> benchmark project I have been referring to in my previous emails. You
>>> can run the class graph.load.LoadGraphBenchmark as a java application to
>>> see what happens. However, this will not replicate the error in your
>>> machine, because the test file included in the project is small (the one
>>> I used when I ran into the error was rather large to send by mail).
>>> Perhaps you will be able to see what's wrong with the code just by
>>> reading through it. I suspect that I misuse Neo, but I can't really
>>> pinpoint the problem.
>>>
>>> Thank you in advance for your help!
>>> Best regards,
>>> Symeon
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>>
>>>>>
>>>>>>>> 2010/1/26 Symeon (Akis) Papadopoulos <papa...@iti.gr>:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi all
>>>>>>>>>
>>>>>>>>> While populating a Neo graph, I got the attached exception
>>>>>>>>> (org.apache.lucene.index.MergePolicy$MergeException. See file for
>>>>>>>>> details: I
>>>>>>>>> replaced a local path with [some-local-path]).
>>>>>>>>> My setup is like this: I want to benchmark Neo4j for some operations,
>>>>>>>>> amongst which is graph loading.
>>>>>>>>> So, I try to load graphs of various sizes to Neo4j. Up to size 1M
>>>>>>>>> edges the
>>>>>>>>> graphs are loaded without any problem, but then this exception is
>>>>>>>>> thrown. I
>>>>>>>>> suspect this has to do with the transaction management in my program,
>>>>>>>>> which
>>>>>>>>> is handled by a BatchTxManager (a class written by me, thus highly
>>>>>>>>> likely to
>>>>>>>>> be the source of trouble). Does the exception ring any bells? What
>>>>>>>>> could I
>>>>>>>>> try out in order to identify the problem?
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Akis
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Exception in thread "Lucene Merge Thread #0"
>>>>>>>>> org.apache.lucene.index.MergePolicy$MergeException:
>>>>>>>>> java.io.FileNotFoundException: /[some-local-path]/_4k.cfs (Too many
>>>>>>>>> open
>>>>>>>>> files)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315)
>>>>>>>>> Caused by: java.io.FileNotFoundException: /[some-local-path]/_4k.cfs
>>>>>>>>> (Too
>>>>>>>>> many open files)
>>>>>>>>>        at java.io.RandomAccessFile.open(Native Method)
>>>>>>>>>        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:78)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:108)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:94)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:70)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:70)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:111)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
>>>>>>>>> Exception in thread "main"
>>>>>>>>> org.neo4j.kernel.impl.transaction.TransactionFailureException: Unable
>>>>>>>>> to
>>>>>>>>> commit transaction
>>>>>>>>>        at
>>>>>>>>> org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:331)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.util.SimpleBatchTxManager.beginReadOperation(SimpleBatchTxManager.java:94)
>>>>>>>>>        at
>>>>>>>>> graph.implementation.neo.STSGraphNeoImpl.getUserNeo(STSGraphNeoImpl.java:463)
>>>>>>>>>        at
>>>>>>>>> graph.implementation.neo.STSGraphNeoImpl.increaseUserTagFreq(STSGraphNeoImpl.java:824)
>>>>>>>>>        at
>>>>>>>>> graph.benchmark.LoadGraphBenchmark.runRealBenchmark(LoadGraphBenchmark.java:433)
>>>>>>>>>        at
>>>>>>>>> graph.benchmark.LoadGraphBenchmark.main(LoadGraphBenchmark.java:489)
>>>>>>>>> Caused by: java.lang.RuntimeException: Unable to close lucene writer
>>>>>>>>> org.apache.lucene.index.indexwri...@32a4eb93
>>>>>>>>>        at
>>>>>>>>> org.neo4j.index.lucene.LuceneDataSource.removeWriter(LuceneDataSource.java:413)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.index.lucene.LuceneTransaction.doCommit(LuceneTransaction.java:197)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:316)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:399)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:514)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:573)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:543)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:102)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:316)
>>>>>>>>>        ... 5 more
>>>>>>>>> Caused by: java.io.IOException: directory
>>>>>>>>> '/[some-local-path]/resource_id'
>>>>>>>>> exists and is a directory, but cannot be listed: list() returned null
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:516)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:531)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:307)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:4300)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4192)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4183)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:2190)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2153)
>>>>>>>>>        at
>>>>>>>>> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2117)
>>>>>>>>>        at
>>>>>>>>> org.neo4j.index.lucene.LuceneDataSource.removeWriter(LuceneDataSource.java:409)
>>>>>>>>>        ... 14 more
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Neo mailing list
>>>>>>>>> User@lists.neo4j.org
>>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Neo mailing list
>>>>>>> User@lists.neo4j.org
>>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Neo mailing list
>>>>> User@lists.neo4j.org
>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> Mattias Persson, [matt...@neotechnology.com]
>> Neo Technology, www.neotechnology.com
>>
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Neo Technology, www.neotechnology.com
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to