Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This is a bug that we fixed yesterday... (assuming it's the same bug).
2009/12/7 Todd Stavish <toddstav...@gmail.com>: > Hi Mattias, Núria. > > I am also running into scalability problems with the Lucene batch > inserter at much smaller numbers, 30,000 indexed nodes. I tried > calling optimize more. Increasing ulimit didn't help. > > INFO] Exception in thread "main" java.lang.RuntimeException: > java.io.FileNotFoundException: > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > (Too many open files) > [INFO] at > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > [INFO] at > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > [INFO] at > com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > [INFO] Caused by: java.io.FileNotFoundException: > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > (Too many open files) > > I tried breaking up to separate batchinserter instances, and it hangs > now. Can I create more than one batch inserter per process if they run > sequentially and non-threaded? > > Thanks, > Todd > > > > > > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <nuriatre...@gmail.com> wrote: >> Hi again Mattias, >> >> I have tried to execute my application with the last version available in >> the maven repository and I still have the same problem. After creating and >> indexing all the nodes, the application calls the "optimize" method and, >> then, it creates all the edges by calling the method "getNodes" in order to >> select the tail and head node of the edge, but it doesn't work because many >> nodes are not found. >> >> I have tried to create only 30 nodes and 15 edges and it works properly, but >> if I try to create a big graph (180 million edges + 20 million nodes) it >> doesn't. >> >> I have also tried to call the "optimize" method every time the application >> has been created 1 million nodes but it doesn't work. >> >> Have you tried to create as many nodes as I have said with the newer >> index-util version? >> >> Thank you, >> >> Núria. >> >> 2009/12/4 Núria Trench <nuriatre...@gmail.com> >> >>> Hi Mattias, >>> >>> Thank you very much for fixing the problem so fast. I will try it as soon >>> as the new changes will be available in the maven repository. >>> >>> Núria. >>> >>> >>> 2009/12/4 Mattias Persson <matt...@neotechnology.com> >>> >>>> I fixed the problem and also added a cache per key for faster >>>> getNodes/getSingleNode lookup during the insert process. However the >>>> cache assumes that there's nothing in the index when the process >>>> starts (which almost always will be true) to speed things up even >>>> further. >>>> >>>> You can control the cache size and if it should be used by overriding >>>> the (this is also documented in the Javadoc): >>>> >>>> boolean useCache() >>>> int getMaxCacheSizePerKey() >>>> >>>> methods in your LuceneIndexBatchInserterImpl instance. The new changes >>>> should be available in the maven repository within an hour. >>>> >>>> 2009/12/4 Mattias Persson <matt...@neotechnology.com>: >>>> > I think I found the problem... it's indexing as it should, but it >>>> > isn't reflected in getNodes/getSingleNode properly until you >>>> > flush/optimize/shutdown the index. I'll try to fix it today! >>>> > >>>> > 2009/12/3 Núria Trench <nuriatre...@gmail.com>: >>>> >> Thank you very much for your response. >>>> >> If you need more information, you only have to send an e-mail and I >>>> will try >>>> >> to explain it better. >>>> >> >>>> >> Núria. >>>> >> >>>> >> 2009/12/3 Mattias Persson <matt...@neotechnology.com> >>>> >> >>>> >>> This is something I'd like to reproduce and I'll do some testing on >>>> >>> this tomorrow >>>> >>> >>>> >>> 2009/12/3 Núria Trench <nuriatre...@gmail.com>: >>>> >>> > Hello, >>>> >>> > >>>> >>> > Last week, I decided to download your graph database core in order >>>> to use >>>> >>> > it. First, I created a new project to parse my CSV files and create >>>> a new >>>> >>> > graph database with Neo4j. This CSV files contain 150 milion edges >>>> and 20 >>>> >>> > milion nodes. >>>> >>> > >>>> >>> > When I finished to write the code which will create the graph >>>> database, I >>>> >>> > executed it and, after six hours of execution, the program crashes >>>> >>> because >>>> >>> > of a Lucene exception. The exception is related to the index merging >>>> and >>>> >>> it >>>> >>> > has the following message: >>>> >>> > "mergeFields produced an invalid result: docCount is 385282378 but >>>> fdx >>>> >>> file >>>> >>> > size is 3082259028; now aborting this merge to prevent index >>>> corruption" >>>> >>> > >>>> >>> > I have searched on the net and I found that it is a lucene bug. The >>>> >>> > libraries used for executing my project were: >>>> >>> > neo-1.0-b10 >>>> >>> > index-util-0.7 >>>> >>> > lucene-core-2.4.0 >>>> >>> > >>>> >>> > So, I decided to use a newer Lucene version. I found that you have a >>>> >>> newer >>>> >>> > index-util version so I updated the libraries: >>>> >>> > neo-1.0-b10 >>>> >>> > index-util-0.9 >>>> >>> > lucene-core-2.9.1 >>>> >>> > >>>> >>> > When I had updated those libraries, I tried to execute my project >>>> again >>>> >>> and >>>> >>> > I found that, in many occassions, it was not indexing properly. So, >>>> I >>>> >>> tried >>>> >>> > to optimize the index after every time I indexed something. This was >>>> a >>>> >>> > solution because, after that, it was indexing properly but the time >>>> >>> > execution increased a lot. >>>> >>> > >>>> >>> > I am not using transactions, instead of this, I am using the Batch >>>> >>> Inserter >>>> >>> > with the LuceneIndexBatchInserter. >>>> >>> > >>>> >>> > So, my question is: What can I do to solve this problem? If use >>>> >>> > index-util-0.7 I cannot finish the execution of creating the graph >>>> >>> database >>>> >>> > and I use index-util-0.9 I have to optimize the index in every >>>> insertion >>>> >>> and >>>> >>> > the execution never ever ends. >>>> >>> > >>>> >>> > Thank you very much in advance, >>>> >>> > >>>> >>> > Núria. >>>> >>> > _______________________________________________ >>>> >>> > Neo mailing list >>>> >>> > User@lists.neo4j.org >>>> >>> > https://lists.neo4j.org/mailman/listinfo/user >>>> >>> > >>>> >>> >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> Mattias Persson, [matt...@neotechnology.com] >>>> >>> Neo Technology, www.neotechnology.com >>>> >>> _______________________________________________ >>>> >>> Neo mailing list >>>> >>> User@lists.neo4j.org >>>> >>> https://lists.neo4j.org/mailman/listinfo/user >>>> >>> >>>> >> _______________________________________________ >>>> >> Neo mailing list >>>> >> User@lists.neo4j.org >>>> >> https://lists.neo4j.org/mailman/listinfo/user >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > Mattias Persson, [matt...@neotechnology.com] >>>> > Neo Technology, www.neotechnology.com >>>> > >>>> >>>> >>>> >>>> -- >>>> Mattias Persson, [matt...@neotechnology.com] >>>> Neo Technology, www.neotechnology.com >>>> _______________________________________________ >>>> Neo mailing list >>>> User@lists.neo4j.org >>>> https://lists.neo4j.org/mailman/listinfo/user >>>> >>> >>> >> _______________________________________________ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > _______________________________________________ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user