Hi again, Núria (it was I, Mattias who asked for the sample code). Well... the fact that you parse 4 csv files doesn't really help me setup a test for this... I mean how can I know that my test will be similar to yours? Would it be ok to attach your code/csv files as well?
/ Mattias 2009/12/9 Núria Trench <nuriatre...@gmail.com>: > Hi Todd, > > The sample code creates nodes and relationships by parsing 4 csv files. > Thank you for trying to trigger this behaviour with this sample. > > Núria > > 2009/12/9 Mattias Persson <matt...@neotechnology.com> > >> Could you provide me with some sample code which can trigger this >> behaviour with the latest index-util-0.9-SNAPSHOT Núria? >> >> 2009/12/9 Núria Trench <nuriatre...@gmail.com>: >> > Todd, >> > >> > I haven't the same problem. In my case, after indexing all the >> > attributes/properties of each node, the application creates all the edges >> by >> > looking up the tail node and the head node. So, it calls the method >> > "org.neo4j.util.index. >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found >> node) >> > in many occasions. >> > >> > Any one has an alternative to get a node with indexex >> attributes/properties? >> > >> > Thank you, >> > >> > Núria. >> > >> > >> > 2009/12/7 Mattias Persson <matt...@neotechnology.com> >> > >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This >> >> is a bug that we fixed yesterday... (assuming it's the same bug). >> >> >> >> 2009/12/7 Todd Stavish <toddstav...@gmail.com>: >> >> > Hi Mattias, Núria. >> >> > >> >> > I am also running into scalability problems with the Lucene batch >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried >> >> > calling optimize more. Increasing ulimit didn't help. >> >> > >> >> > INFO] Exception in thread "main" java.lang.RuntimeException: >> >> > java.io.FileNotFoundException: >> >> > >> >> >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> >> > (Too many open files) >> >> > [INFO] at >> >> >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) >> >> > [INFO] at >> >> >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) >> >> > [INFO] at >> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) >> >> > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) >> >> > [INFO] Caused by: java.io.FileNotFoundException: >> >> > >> >> >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> >> > (Too many open files) >> >> > >> >> > I tried breaking up to separate batchinserter instances, and it hangs >> >> > now. Can I create more than one batch inserter per process if they run >> >> > sequentially and non-threaded? >> >> > >> >> > Thanks, >> >> > Todd >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <nuriatre...@gmail.com> >> >> wrote: >> >> >> Hi again Mattias, >> >> >> >> >> >> I have tried to execute my application with the last version >> available >> >> in >> >> >> the maven repository and I still have the same problem. After >> creating >> >> and >> >> >> indexing all the nodes, the application calls the "optimize" method >> and, >> >> >> then, it creates all the edges by calling the method "getNodes" in >> order >> >> to >> >> >> select the tail and head node of the edge, but it doesn't work >> because >> >> many >> >> >> nodes are not found. >> >> >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works >> properly, >> >> but >> >> >> if I try to create a big graph (180 million edges + 20 million nodes) >> it >> >> >> doesn't. >> >> >> >> >> >> I have also tried to call the "optimize" method every time the >> >> application >> >> >> has been created 1 million nodes but it doesn't work. >> >> >> >> >> >> Have you tried to create as many nodes as I have said with the newer >> >> >> index-util version? >> >> >> >> >> >> Thank you, >> >> >> >> >> >> Núria. >> >> >> >> >> >> 2009/12/4 Núria Trench <nuriatre...@gmail.com> >> >> >> >> >> >>> Hi Mattias, >> >> >>> >> >> >>> Thank you very much for fixing the problem so fast. I will try it as >> >> soon >> >> >>> as the new changes will be available in the maven repository. >> >> >>> >> >> >>> Núria. >> >> >>> >> >> >>> >> >> >>> 2009/12/4 Mattias Persson <matt...@neotechnology.com> >> >> >>> >> >> >>>> I fixed the problem and also added a cache per key for faster >> >> >>>> getNodes/getSingleNode lookup during the insert process. However >> the >> >> >>>> cache assumes that there's nothing in the index when the process >> >> >>>> starts (which almost always will be true) to speed things up even >> >> >>>> further. >> >> >>>> >> >> >>>> You can control the cache size and if it should be used by >> overriding >> >> >>>> the (this is also documented in the Javadoc): >> >> >>>> >> >> >>>> boolean useCache() >> >> >>>> int getMaxCacheSizePerKey() >> >> >>>> >> >> >>>> methods in your LuceneIndexBatchInserterImpl instance. The new >> changes >> >> >>>> should be available in the maven repository within an hour. >> >> >>>> >> >> >>>> 2009/12/4 Mattias Persson <matt...@neotechnology.com>: >> >> >>>> > I think I found the problem... it's indexing as it should, but it >> >> >>>> > isn't reflected in getNodes/getSingleNode properly until you >> >> >>>> > flush/optimize/shutdown the index. I'll try to fix it today! >> >> >>>> > >> >> >>>> > 2009/12/3 Núria Trench <nuriatre...@gmail.com>: >> >> >>>> >> Thank you very much for your response. >> >> >>>> >> If you need more information, you only have to send an e-mail >> and I >> >> >>>> will try >> >> >>>> >> to explain it better. >> >> >>>> >> >> >> >>>> >> Núria. >> >> >>>> >> >> >> >>>> >> 2009/12/3 Mattias Persson <matt...@neotechnology.com> >> >> >>>> >> >> >> >>>> >>> This is something I'd like to reproduce and I'll do some >> testing >> >> on >> >> >>>> >>> this tomorrow >> >> >>>> >>> >> >> >>>> >>> 2009/12/3 Núria Trench <nuriatre...@gmail.com>: >> >> >>>> >>> > Hello, >> >> >>>> >>> > >> >> >>>> >>> > Last week, I decided to download your graph database core in >> >> order >> >> >>>> to use >> >> >>>> >>> > it. First, I created a new project to parse my CSV files and >> >> create >> >> >>>> a new >> >> >>>> >>> > graph database with Neo4j. This CSV files contain 150 milion >> >> edges >> >> >>>> and 20 >> >> >>>> >>> > milion nodes. >> >> >>>> >>> > >> >> >>>> >>> > When I finished to write the code which will create the graph >> >> >>>> database, I >> >> >>>> >>> > executed it and, after six hours of execution, the program >> >> crashes >> >> >>>> >>> because >> >> >>>> >>> > of a Lucene exception. The exception is related to the index >> >> merging >> >> >>>> and >> >> >>>> >>> it >> >> >>>> >>> > has the following message: >> >> >>>> >>> > "mergeFields produced an invalid result: docCount is >> 385282378 >> >> but >> >> >>>> fdx >> >> >>>> >>> file >> >> >>>> >>> > size is 3082259028; now aborting this merge to prevent index >> >> >>>> corruption" >> >> >>>> >>> > >> >> >>>> >>> > I have searched on the net and I found that it is a lucene >> bug. >> >> The >> >> >>>> >>> > libraries used for executing my project were: >> >> >>>> >>> > neo-1.0-b10 >> >> >>>> >>> > index-util-0.7 >> >> >>>> >>> > lucene-core-2.4.0 >> >> >>>> >>> > >> >> >>>> >>> > So, I decided to use a newer Lucene version. I found that you >> >> have a >> >> >>>> >>> newer >> >> >>>> >>> > index-util version so I updated the libraries: >> >> >>>> >>> > neo-1.0-b10 >> >> >>>> >>> > index-util-0.9 >> >> >>>> >>> > lucene-core-2.9.1 >> >> >>>> >>> > >> >> >>>> >>> > When I had updated those libraries, I tried to execute my >> >> project >> >> >>>> again >> >> >>>> >>> and >> >> >>>> >>> > I found that, in many occassions, it was not indexing >> properly. >> >> So, >> >> >>>> I >> >> >>>> >>> tried >> >> >>>> >>> > to optimize the index after every time I indexed something. >> This >> >> was >> >> >>>> a >> >> >>>> >>> > solution because, after that, it was indexing properly but >> the >> >> time >> >> >>>> >>> > execution increased a lot. >> >> >>>> >>> > >> >> >>>> >>> > I am not using transactions, instead of this, I am using the >> >> Batch >> >> >>>> >>> Inserter >> >> >>>> >>> > with the LuceneIndexBatchInserter. >> >> >>>> >>> > >> >> >>>> >>> > So, my question is: What can I do to solve this problem? If >> use >> >> >>>> >>> > index-util-0.7 I cannot finish the execution of creating the >> >> graph >> >> >>>> >>> database >> >> >>>> >>> > and I use index-util-0.9 I have to optimize the index in >> every >> >> >>>> insertion >> >> >>>> >>> and >> >> >>>> >>> > the execution never ever ends. >> >> >>>> >>> > >> >> >>>> >>> > Thank you very much in advance, >> >> >>>> >>> > >> >> >>>> >>> > Núria. >> >> >>>> >>> > _______________________________________________ >> >> >>>> >>> > Neo mailing list >> >> >>>> >>> > User@lists.neo4j.org >> >> >>>> >>> > https://lists.neo4j.org/mailman/listinfo/user >> >> >>>> >>> > >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> >>> -- >> >> >>>> >>> Mattias Persson, [matt...@neotechnology.com] >> >> >>>> >>> Neo Technology, www.neotechnology.com >> >> >>>> >>> _______________________________________________ >> >> >>>> >>> Neo mailing list >> >> >>>> >>> User@lists.neo4j.org >> >> >>>> >>> https://lists.neo4j.org/mailman/listinfo/user >> >> >>>> >>> >> >> >>>> >> _______________________________________________ >> >> >>>> >> Neo mailing list >> >> >>>> >> User@lists.neo4j.org >> >> >>>> >> https://lists.neo4j.org/mailman/listinfo/user >> >> >>>> >> >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> > -- >> >> >>>> > Mattias Persson, [matt...@neotechnology.com] >> >> >>>> > Neo Technology, www.neotechnology.com >> >> >>>> > >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> -- >> >> >>>> Mattias Persson, [matt...@neotechnology.com] >> >> >>>> Neo Technology, www.neotechnology.com >> >> >>>> _______________________________________________ >> >> >>>> Neo mailing list >> >> >>>> User@lists.neo4j.org >> >> >>>> https://lists.neo4j.org/mailman/listinfo/user >> >> >>>> >> >> >>> >> >> >>> >> >> >> _______________________________________________ >> >> >> Neo mailing list >> >> >> User@lists.neo4j.org >> >> >> https://lists.neo4j.org/mailman/listinfo/user >> >> >> >> >> > _______________________________________________ >> >> > Neo mailing list >> >> > User@lists.neo4j.org >> >> > https://lists.neo4j.org/mailman/listinfo/user >> >> > >> >> >> >> >> >> >> >> -- >> >> Mattias Persson, [matt...@neotechnology.com] >> >> Neo Technology, www.neotechnology.com >> >> _______________________________________________ >> >> Neo mailing list >> >> User@lists.neo4j.org >> >> https://lists.neo4j.org/mailman/listinfo/user >> >> >> > _______________________________________________ >> > Neo mailing list >> > User@lists.neo4j.org >> > https://lists.neo4j.org/mailman/listinfo/user >> > >> >> >> >> -- >> Mattias Persson, [matt...@neotechnology.com] >> Neo Technology, www.neotechnology.com >> _______________________________________________ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > _______________________________________________ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user