Todd, I haven't the same problem. In my case, after indexing all the attributes/properties of each node, the application creates all the edges by looking up the tail node and the head node. So, it calls the method "org.neo4j.util.index. LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node) in many occasions.
Any one has an alternative to get a node with indexex attributes/properties? Thank you, Núria. 2009/12/7 Mattias Persson <matt...@neotechnology.com> > Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This > is a bug that we fixed yesterday... (assuming it's the same bug). > > 2009/12/7 Todd Stavish <toddstav...@gmail.com>: > > Hi Mattias, Núria. > > > > I am also running into scalability problems with the Lucene batch > > inserter at much smaller numbers, 30,000 indexed nodes. I tried > > calling optimize more. Increasing ulimit didn't help. > > > > INFO] Exception in thread "main" java.lang.RuntimeException: > > java.io.FileNotFoundException: > > > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > > (Too many open files) > > [INFO] at > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > > [INFO] at > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > > [INFO] at > com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > > [INFO] Caused by: java.io.FileNotFoundException: > > > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > > (Too many open files) > > > > I tried breaking up to separate batchinserter instances, and it hangs > > now. Can I create more than one batch inserter per process if they run > > sequentially and non-threaded? > > > > Thanks, > > Todd > > > > > > > > > > > > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <nuriatre...@gmail.com> > wrote: > >> Hi again Mattias, > >> > >> I have tried to execute my application with the last version available > in > >> the maven repository and I still have the same problem. After creating > and > >> indexing all the nodes, the application calls the "optimize" method and, > >> then, it creates all the edges by calling the method "getNodes" in order > to > >> select the tail and head node of the edge, but it doesn't work because > many > >> nodes are not found. > >> > >> I have tried to create only 30 nodes and 15 edges and it works properly, > but > >> if I try to create a big graph (180 million edges + 20 million nodes) it > >> doesn't. > >> > >> I have also tried to call the "optimize" method every time the > application > >> has been created 1 million nodes but it doesn't work. > >> > >> Have you tried to create as many nodes as I have said with the newer > >> index-util version? > >> > >> Thank you, > >> > >> Núria. > >> > >> 2009/12/4 Núria Trench <nuriatre...@gmail.com> > >> > >>> Hi Mattias, > >>> > >>> Thank you very much for fixing the problem so fast. I will try it as > soon > >>> as the new changes will be available in the maven repository. > >>> > >>> Núria. > >>> > >>> > >>> 2009/12/4 Mattias Persson <matt...@neotechnology.com> > >>> > >>>> I fixed the problem and also added a cache per key for faster > >>>> getNodes/getSingleNode lookup during the insert process. However the > >>>> cache assumes that there's nothing in the index when the process > >>>> starts (which almost always will be true) to speed things up even > >>>> further. > >>>> > >>>> You can control the cache size and if it should be used by overriding > >>>> the (this is also documented in the Javadoc): > >>>> > >>>> boolean useCache() > >>>> int getMaxCacheSizePerKey() > >>>> > >>>> methods in your LuceneIndexBatchInserterImpl instance. The new changes > >>>> should be available in the maven repository within an hour. > >>>> > >>>> 2009/12/4 Mattias Persson <matt...@neotechnology.com>: > >>>> > I think I found the problem... it's indexing as it should, but it > >>>> > isn't reflected in getNodes/getSingleNode properly until you > >>>> > flush/optimize/shutdown the index. I'll try to fix it today! > >>>> > > >>>> > 2009/12/3 Núria Trench <nuriatre...@gmail.com>: > >>>> >> Thank you very much for your response. > >>>> >> If you need more information, you only have to send an e-mail and I > >>>> will try > >>>> >> to explain it better. > >>>> >> > >>>> >> Núria. > >>>> >> > >>>> >> 2009/12/3 Mattias Persson <matt...@neotechnology.com> > >>>> >> > >>>> >>> This is something I'd like to reproduce and I'll do some testing > on > >>>> >>> this tomorrow > >>>> >>> > >>>> >>> 2009/12/3 Núria Trench <nuriatre...@gmail.com>: > >>>> >>> > Hello, > >>>> >>> > > >>>> >>> > Last week, I decided to download your graph database core in > order > >>>> to use > >>>> >>> > it. First, I created a new project to parse my CSV files and > create > >>>> a new > >>>> >>> > graph database with Neo4j. This CSV files contain 150 milion > edges > >>>> and 20 > >>>> >>> > milion nodes. > >>>> >>> > > >>>> >>> > When I finished to write the code which will create the graph > >>>> database, I > >>>> >>> > executed it and, after six hours of execution, the program > crashes > >>>> >>> because > >>>> >>> > of a Lucene exception. The exception is related to the index > merging > >>>> and > >>>> >>> it > >>>> >>> > has the following message: > >>>> >>> > "mergeFields produced an invalid result: docCount is 385282378 > but > >>>> fdx > >>>> >>> file > >>>> >>> > size is 3082259028; now aborting this merge to prevent index > >>>> corruption" > >>>> >>> > > >>>> >>> > I have searched on the net and I found that it is a lucene bug. > The > >>>> >>> > libraries used for executing my project were: > >>>> >>> > neo-1.0-b10 > >>>> >>> > index-util-0.7 > >>>> >>> > lucene-core-2.4.0 > >>>> >>> > > >>>> >>> > So, I decided to use a newer Lucene version. I found that you > have a > >>>> >>> newer > >>>> >>> > index-util version so I updated the libraries: > >>>> >>> > neo-1.0-b10 > >>>> >>> > index-util-0.9 > >>>> >>> > lucene-core-2.9.1 > >>>> >>> > > >>>> >>> > When I had updated those libraries, I tried to execute my > project > >>>> again > >>>> >>> and > >>>> >>> > I found that, in many occassions, it was not indexing properly. > So, > >>>> I > >>>> >>> tried > >>>> >>> > to optimize the index after every time I indexed something. This > was > >>>> a > >>>> >>> > solution because, after that, it was indexing properly but the > time > >>>> >>> > execution increased a lot. > >>>> >>> > > >>>> >>> > I am not using transactions, instead of this, I am using the > Batch > >>>> >>> Inserter > >>>> >>> > with the LuceneIndexBatchInserter. > >>>> >>> > > >>>> >>> > So, my question is: What can I do to solve this problem? If use > >>>> >>> > index-util-0.7 I cannot finish the execution of creating the > graph > >>>> >>> database > >>>> >>> > and I use index-util-0.9 I have to optimize the index in every > >>>> insertion > >>>> >>> and > >>>> >>> > the execution never ever ends. > >>>> >>> > > >>>> >>> > Thank you very much in advance, > >>>> >>> > > >>>> >>> > Núria. > >>>> >>> > _______________________________________________ > >>>> >>> > Neo mailing list > >>>> >>> > User@lists.neo4j.org > >>>> >>> > https://lists.neo4j.org/mailman/listinfo/user > >>>> >>> > > >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> -- > >>>> >>> Mattias Persson, [matt...@neotechnology.com] > >>>> >>> Neo Technology, www.neotechnology.com > >>>> >>> _______________________________________________ > >>>> >>> Neo mailing list > >>>> >>> User@lists.neo4j.org > >>>> >>> https://lists.neo4j.org/mailman/listinfo/user > >>>> >>> > >>>> >> _______________________________________________ > >>>> >> Neo mailing list > >>>> >> User@lists.neo4j.org > >>>> >> https://lists.neo4j.org/mailman/listinfo/user > >>>> >> > >>>> > > >>>> > > >>>> > > >>>> > -- > >>>> > Mattias Persson, [matt...@neotechnology.com] > >>>> > Neo Technology, www.neotechnology.com > >>>> > > >>>> > >>>> > >>>> > >>>> -- > >>>> Mattias Persson, [matt...@neotechnology.com] > >>>> Neo Technology, www.neotechnology.com > >>>> _______________________________________________ > >>>> Neo mailing list > >>>> User@lists.neo4j.org > >>>> https://lists.neo4j.org/mailman/listinfo/user > >>>> > >>> > >>> > >> _______________________________________________ > >> Neo mailing list > >> User@lists.neo4j.org > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > _______________________________________________ > > Neo mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Neo Technology, www.neotechnology.com > _______________________________________________ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user