Another problem I see is that you're having too granular transactions which will slow down the insertion process quite a bit. Try grouping a couple of thousands operations in one transaction and you'll see a performance boost!
FYI: I can trigger the problem you were having with lucene "too many open files" issue. And I'm almost 100% sure that it will be resolved if you increase the span of your transactions. 2010/1/29 Mattias Persson <matt...@neotechnology.com>: > I see that you're not using the Neo4j BatchInserter. Your use case > could definately benefit from it. Take a look at > http://wiki.neo4j.org/content/Batch_Insert on how to use it. And since > you're using index lookups you'll have to keep most of the index in > your own HashMap or other cache (instead of going down to index) for > maximum performance. (the LuceneIndexBatchInserterImpl doesn't have > cache built-in, like the LuceneIndexService does). > > You typically gain 5 times the speed or more when using the > BatchInserter instead of GraphDatabaseService for inserting big data > sets. > > 2010/1/28 Mattias Persson <matt...@neotechnology.com>: >> Im sorry but I think your attachment got caught in our mail filters. >> >> Could you perhaps send me your project (including the entire data file >> via some file sending service, f.ex http://sprend.com) and send it to >> me directly, matt...@neotechnology.com? >> >> 2010/1/28, Symeon (Akis) Papadopoulos <papa...@iti.gr>: >>> .... >>>> Great, lucene handles merging in the background automatically if the >>>> files are too sharded. So this error shouldn't occur unless there are >>>> some corner case where some IndexReader/IndexWriter isn't closed >>>> properly... so that's why I'm suspecting a bug here :) >>>> >>>> Great that you found a work-around, but I suspect it could happen even >>>> with a higher ulimit though. >>>> >>>>> The project I'm running is rather large, but at some point I will try to >>>>> prepare a script in order to replicate the error. >>>>> >>>> That would be great indeed! >>>> >>> I attach an eclipse project which is an extract from the larger >>> benchmark project I have been referring to in my previous emails. You >>> can run the class graph.load.LoadGraphBenchmark as a java application to >>> see what happens. However, this will not replicate the error in your >>> machine, because the test file included in the project is small (the one >>> I used when I ran into the error was rather large to send by mail). >>> Perhaps you will be able to see what's wrong with the code just by >>> reading through it. I suspect that I misuse Neo, but I can't really >>> pinpoint the problem. >>> >>> Thank you in advance for your help! >>> Best regards, >>> Symeon >>> >>> >>> >>> >>> >>> >>> >>>>> >>>>> >>>>>>>> 2010/1/26 Symeon (Akis) Papadopoulos <papa...@iti.gr>: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hi all >>>>>>>>> >>>>>>>>> While populating a Neo graph, I got the attached exception >>>>>>>>> (org.apache.lucene.index.MergePolicy$MergeException. See file for >>>>>>>>> details: I >>>>>>>>> replaced a local path with [some-local-path]). >>>>>>>>> My setup is like this: I want to benchmark Neo4j for some operations, >>>>>>>>> amongst which is graph loading. >>>>>>>>> So, I try to load graphs of various sizes to Neo4j. Up to size 1M >>>>>>>>> edges the >>>>>>>>> graphs are loaded without any problem, but then this exception is >>>>>>>>> thrown. I >>>>>>>>> suspect this has to do with the transaction management in my program, >>>>>>>>> which >>>>>>>>> is handled by a BatchTxManager (a class written by me, thus highly >>>>>>>>> likely to >>>>>>>>> be the source of trouble). Does the exception ring any bells? What >>>>>>>>> could I >>>>>>>>> try out in order to identify the problem? >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Akis >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Exception in thread "Lucene Merge Thread #0" >>>>>>>>> org.apache.lucene.index.MergePolicy$MergeException: >>>>>>>>> java.io.FileNotFoundException: /[some-local-path]/_4k.cfs (Too many >>>>>>>>> open >>>>>>>>> files) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315) >>>>>>>>> Caused by: java.io.FileNotFoundException: /[some-local-path]/_4k.cfs >>>>>>>>> (Too >>>>>>>>> many open files) >>>>>>>>> at java.io.RandomAccessFile.open(Native Method) >>>>>>>>> at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233) >>>>>>>>> at >>>>>>>>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.<init>(SimpleFSDirectory.java:78) >>>>>>>>> at >>>>>>>>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:108) >>>>>>>>> at >>>>>>>>> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.<init>(NIOFSDirectory.java:94) >>>>>>>>> at >>>>>>>>> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:70) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:70) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.SegmentReader$CoreReaders.<init>(SegmentReader.java:111) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:638) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.SegmentReader.get(SegmentReader.java:608) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:686) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4979) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) >>>>>>>>> Exception in thread "main" >>>>>>>>> org.neo4j.kernel.impl.transaction.TransactionFailureException: Unable >>>>>>>>> to >>>>>>>>> commit transaction >>>>>>>>> at >>>>>>>>> org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:331) >>>>>>>>> at >>>>>>>>> org.neo4j.util.SimpleBatchTxManager.beginReadOperation(SimpleBatchTxManager.java:94) >>>>>>>>> at >>>>>>>>> graph.implementation.neo.STSGraphNeoImpl.getUserNeo(STSGraphNeoImpl.java:463) >>>>>>>>> at >>>>>>>>> graph.implementation.neo.STSGraphNeoImpl.increaseUserTagFreq(STSGraphNeoImpl.java:824) >>>>>>>>> at >>>>>>>>> graph.benchmark.LoadGraphBenchmark.runRealBenchmark(LoadGraphBenchmark.java:433) >>>>>>>>> at >>>>>>>>> graph.benchmark.LoadGraphBenchmark.main(LoadGraphBenchmark.java:489) >>>>>>>>> Caused by: java.lang.RuntimeException: Unable to close lucene writer >>>>>>>>> org.apache.lucene.index.indexwri...@32a4eb93 >>>>>>>>> at >>>>>>>>> org.neo4j.index.lucene.LuceneDataSource.removeWriter(LuceneDataSource.java:413) >>>>>>>>> at >>>>>>>>> org.neo4j.index.lucene.LuceneTransaction.doCommit(LuceneTransaction.java:197) >>>>>>>>> at >>>>>>>>> org.neo4j.kernel.impl.transaction.xaframework.XaTransaction.commit(XaTransaction.java:316) >>>>>>>>> at >>>>>>>>> org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.commit(XaResourceManager.java:399) >>>>>>>>> at >>>>>>>>> org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.commit(XaResourceHelpImpl.java:64) >>>>>>>>> at >>>>>>>>> org.neo4j.kernel.impl.transaction.TransactionImpl.doCommit(TransactionImpl.java:514) >>>>>>>>> at >>>>>>>>> org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:573) >>>>>>>>> at >>>>>>>>> org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:543) >>>>>>>>> at >>>>>>>>> org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:102) >>>>>>>>> at >>>>>>>>> org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:316) >>>>>>>>> ... 5 more >>>>>>>>> Caused by: java.io.IOException: directory >>>>>>>>> '/[some-local-path]/resource_id' >>>>>>>>> exists and is a directory, but cannot be listed: list() returned null >>>>>>>>> at >>>>>>>>> org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:516) >>>>>>>>> at >>>>>>>>> org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:531) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:307) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:4300) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4192) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:4183) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:2190) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2153) >>>>>>>>> at >>>>>>>>> org.apache.lucene.index.IndexWriter.close(IndexWriter.java:2117) >>>>>>>>> at >>>>>>>>> org.neo4j.index.lucene.LuceneDataSource.removeWriter(LuceneDataSource.java:409) >>>>>>>>> ... 14 more >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Neo mailing list >>>>>>>>> User@lists.neo4j.org >>>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Neo mailing list >>>>>>> User@lists.neo4j.org >>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Neo mailing list >>>>> User@lists.neo4j.org >>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >> >> >> -- >> Mattias Persson, [matt...@neotechnology.com] >> Neo Technology, www.neotechnology.com >> > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Neo Technology, www.neotechnology.com > -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user