Hi Mattias, In my last e-mail I have attached the sample code, haven't you received it? I will try to attach it again.
Núria. 2009/12/9 Mattias Persson <matt...@neotechnology.com> > Hi again, Núria (it was I, Mattias who asked for the sample code). > Well... the fact that you parse 4 csv files doesn't really help me > setup a test for this... I mean how can I know that my test will be > similar to yours? Would it be ok to attach your code/csv files as > well? > > / Mattias > > 2009/12/9 Núria Trench <nuriatre...@gmail.com>: > > Hi Todd, > > > > The sample code creates nodes and relationships by parsing 4 csv files. > > Thank you for trying to trigger this behaviour with this sample. > > > > Núria > > > > 2009/12/9 Mattias Persson <matt...@neotechnology.com> > > > >> Could you provide me with some sample code which can trigger this > >> behaviour with the latest index-util-0.9-SNAPSHOT Núria? > >> > >> 2009/12/9 Núria Trench <nuriatre...@gmail.com>: > >> > Todd, > >> > > >> > I haven't the same problem. In my case, after indexing all the > >> > attributes/properties of each node, the application creates all the > edges > >> by > >> > looking up the tail node and the head node. So, it calls the method > >> > "org.neo4j.util.index. > >> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found > >> node) > >> > in many occasions. > >> > > >> > Any one has an alternative to get a node with indexex > >> attributes/properties? > >> > > >> > Thank you, > >> > > >> > Núria. > >> > > >> > > >> > 2009/12/7 Mattias Persson <matt...@neotechnology.com> > >> > > >> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This > >> >> is a bug that we fixed yesterday... (assuming it's the same bug). > >> >> > >> >> 2009/12/7 Todd Stavish <toddstav...@gmail.com>: > >> >> > Hi Mattias, Núria. > >> >> > > >> >> > I am also running into scalability problems with the Lucene batch > >> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried > >> >> > calling optimize more. Increasing ulimit didn't help. > >> >> > > >> >> > INFO] Exception in thread "main" java.lang.RuntimeException: > >> >> > java.io.FileNotFoundException: > >> >> > > >> >> > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> >> > (Too many open files) > >> >> > [INFO] at > >> >> > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > >> >> > [INFO] at > >> >> > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > >> >> > [INFO] at > >> >> > com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > >> >> > [INFO] at > com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > >> >> > [INFO] Caused by: java.io.FileNotFoundException: > >> >> > > >> >> > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> >> > (Too many open files) > >> >> > > >> >> > I tried breaking up to separate batchinserter instances, and it > hangs > >> >> > now. Can I create more than one batch inserter per process if they > run > >> >> > sequentially and non-threaded? > >> >> > > >> >> > Thanks, > >> >> > Todd > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench < > nuriatre...@gmail.com> > >> >> wrote: > >> >> >> Hi again Mattias, > >> >> >> > >> >> >> I have tried to execute my application with the last version > >> available > >> >> in > >> >> >> the maven repository and I still have the same problem. After > >> creating > >> >> and > >> >> >> indexing all the nodes, the application calls the "optimize" > method > >> and, > >> >> >> then, it creates all the edges by calling the method "getNodes" in > >> order > >> >> to > >> >> >> select the tail and head node of the edge, but it doesn't work > >> because > >> >> many > >> >> >> nodes are not found. > >> >> >> > >> >> >> I have tried to create only 30 nodes and 15 edges and it works > >> properly, > >> >> but > >> >> >> if I try to create a big graph (180 million edges + 20 million > nodes) > >> it > >> >> >> doesn't. > >> >> >> > >> >> >> I have also tried to call the "optimize" method every time the > >> >> application > >> >> >> has been created 1 million nodes but it doesn't work. > >> >> >> > >> >> >> Have you tried to create as many nodes as I have said with the > newer > >> >> >> index-util version? > >> >> >> > >> >> >> Thank you, > >> >> >> > >> >> >> Núria. > >> >> >> > >> >> >> 2009/12/4 Núria Trench <nuriatre...@gmail.com> > >> >> >> > >> >> >>> Hi Mattias, > >> >> >>> > >> >> >>> Thank you very much for fixing the problem so fast. I will try it > as > >> >> soon > >> >> >>> as the new changes will be available in the maven repository. > >> >> >>> > >> >> >>> Núria. > >> >> >>> > >> >> >>> > >> >> >>> 2009/12/4 Mattias Persson <matt...@neotechnology.com> > >> >> >>> > >> >> >>>> I fixed the problem and also added a cache per key for faster > >> >> >>>> getNodes/getSingleNode lookup during the insert process. However > >> the > >> >> >>>> cache assumes that there's nothing in the index when the process > >> >> >>>> starts (which almost always will be true) to speed things up > even > >> >> >>>> further. > >> >> >>>> > >> >> >>>> You can control the cache size and if it should be used by > >> overriding > >> >> >>>> the (this is also documented in the Javadoc): > >> >> >>>> > >> >> >>>> boolean useCache() > >> >> >>>> int getMaxCacheSizePerKey() > >> >> >>>> > >> >> >>>> methods in your LuceneIndexBatchInserterImpl instance. The new > >> changes > >> >> >>>> should be available in the maven repository within an hour. > >> >> >>>> > >> >> >>>> 2009/12/4 Mattias Persson <matt...@neotechnology.com>: > >> >> >>>> > I think I found the problem... it's indexing as it should, but > it > >> >> >>>> > isn't reflected in getNodes/getSingleNode properly until you > >> >> >>>> > flush/optimize/shutdown the index. I'll try to fix it today! > >> >> >>>> > > >> >> >>>> > 2009/12/3 Núria Trench <nuriatre...@gmail.com>: > >> >> >>>> >> Thank you very much for your response. > >> >> >>>> >> If you need more information, you only have to send an e-mail > >> and I > >> >> >>>> will try > >> >> >>>> >> to explain it better. > >> >> >>>> >> > >> >> >>>> >> Núria. > >> >> >>>> >> > >> >> >>>> >> 2009/12/3 Mattias Persson <matt...@neotechnology.com> > >> >> >>>> >> > >> >> >>>> >>> This is something I'd like to reproduce and I'll do some > >> testing > >> >> on > >> >> >>>> >>> this tomorrow > >> >> >>>> >>> > >> >> >>>> >>> 2009/12/3 Núria Trench <nuriatre...@gmail.com>: > >> >> >>>> >>> > Hello, > >> >> >>>> >>> > > >> >> >>>> >>> > Last week, I decided to download your graph database core > in > >> >> order > >> >> >>>> to use > >> >> >>>> >>> > it. First, I created a new project to parse my CSV files > and > >> >> create > >> >> >>>> a new > >> >> >>>> >>> > graph database with Neo4j. This CSV files contain 150 > milion > >> >> edges > >> >> >>>> and 20 > >> >> >>>> >>> > milion nodes. > >> >> >>>> >>> > > >> >> >>>> >>> > When I finished to write the code which will create the > graph > >> >> >>>> database, I > >> >> >>>> >>> > executed it and, after six hours of execution, the program > >> >> crashes > >> >> >>>> >>> because > >> >> >>>> >>> > of a Lucene exception. The exception is related to the > index > >> >> merging > >> >> >>>> and > >> >> >>>> >>> it > >> >> >>>> >>> > has the following message: > >> >> >>>> >>> > "mergeFields produced an invalid result: docCount is > >> 385282378 > >> >> but > >> >> >>>> fdx > >> >> >>>> >>> file > >> >> >>>> >>> > size is 3082259028; now aborting this merge to prevent > index > >> >> >>>> corruption" > >> >> >>>> >>> > > >> >> >>>> >>> > I have searched on the net and I found that it is a lucene > >> bug. > >> >> The > >> >> >>>> >>> > libraries used for executing my project were: > >> >> >>>> >>> > neo-1.0-b10 > >> >> >>>> >>> > index-util-0.7 > >> >> >>>> >>> > lucene-core-2.4.0 > >> >> >>>> >>> > > >> >> >>>> >>> > So, I decided to use a newer Lucene version. I found that > you > >> >> have a > >> >> >>>> >>> newer > >> >> >>>> >>> > index-util version so I updated the libraries: > >> >> >>>> >>> > neo-1.0-b10 > >> >> >>>> >>> > index-util-0.9 > >> >> >>>> >>> > lucene-core-2.9.1 > >> >> >>>> >>> > > >> >> >>>> >>> > When I had updated those libraries, I tried to execute my > >> >> project > >> >> >>>> again > >> >> >>>> >>> and > >> >> >>>> >>> > I found that, in many occassions, it was not indexing > >> properly. > >> >> So, > >> >> >>>> I > >> >> >>>> >>> tried > >> >> >>>> >>> > to optimize the index after every time I indexed > something. > >> This > >> >> was > >> >> >>>> a > >> >> >>>> >>> > solution because, after that, it was indexing properly but > >> the > >> >> time > >> >> >>>> >>> > execution increased a lot. > >> >> >>>> >>> > > >> >> >>>> >>> > I am not using transactions, instead of this, I am using > the > >> >> Batch > >> >> >>>> >>> Inserter > >> >> >>>> >>> > with the LuceneIndexBatchInserter. > >> >> >>>> >>> > > >> >> >>>> >>> > So, my question is: What can I do to solve this problem? > If > >> use > >> >> >>>> >>> > index-util-0.7 I cannot finish the execution of creating > the > >> >> graph > >> >> >>>> >>> database > >> >> >>>> >>> > and I use index-util-0.9 I have to optimize the index in > >> every > >> >> >>>> insertion > >> >> >>>> >>> and > >> >> >>>> >>> > the execution never ever ends. > >> >> >>>> >>> > > >> >> >>>> >>> > Thank you very much in advance, > >> >> >>>> >>> > > >> >> >>>> >>> > Núria. > >> >> >>>> >>> > _______________________________________________ > >> >> >>>> >>> > Neo mailing list > >> >> >>>> >>> > User@lists.neo4j.org > >> >> >>>> >>> > https://lists.neo4j.org/mailman/listinfo/user > >> >> >>>> >>> > > >> >> >>>> >>> > >> >> >>>> >>> > >> >> >>>> >>> > >> >> >>>> >>> -- > >> >> >>>> >>> Mattias Persson, [matt...@neotechnology.com] > >> >> >>>> >>> Neo Technology, www.neotechnology.com > >> >> >>>> >>> _______________________________________________ > >> >> >>>> >>> Neo mailing list > >> >> >>>> >>> User@lists.neo4j.org > >> >> >>>> >>> https://lists.neo4j.org/mailman/listinfo/user > >> >> >>>> >>> > >> >> >>>> >> _______________________________________________ > >> >> >>>> >> Neo mailing list > >> >> >>>> >> User@lists.neo4j.org > >> >> >>>> >> https://lists.neo4j.org/mailman/listinfo/user > >> >> >>>> >> > >> >> >>>> > > >> >> >>>> > > >> >> >>>> > > >> >> >>>> > -- > >> >> >>>> > Mattias Persson, [matt...@neotechnology.com] > >> >> >>>> > Neo Technology, www.neotechnology.com > >> >> >>>> > > >> >> >>>> > >> >> >>>> > >> >> >>>> > >> >> >>>> -- > >> >> >>>> Mattias Persson, [matt...@neotechnology.com] > >> >> >>>> Neo Technology, www.neotechnology.com > >> >> >>>> _______________________________________________ > >> >> >>>> Neo mailing list > >> >> >>>> User@lists.neo4j.org > >> >> >>>> https://lists.neo4j.org/mailman/listinfo/user > >> >> >>>> > >> >> >>> > >> >> >>> > >> >> >> _______________________________________________ > >> >> >> Neo mailing list > >> >> >> User@lists.neo4j.org > >> >> >> https://lists.neo4j.org/mailman/listinfo/user > >> >> >> > >> >> > _______________________________________________ > >> >> > Neo mailing list > >> >> > User@lists.neo4j.org > >> >> > https://lists.neo4j.org/mailman/listinfo/user > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Mattias Persson, [matt...@neotechnology.com] > >> >> Neo Technology, www.neotechnology.com > >> >> _______________________________________________ > >> >> Neo mailing list > >> >> User@lists.neo4j.org > >> >> https://lists.neo4j.org/mailman/listinfo/user > >> >> > >> > _______________________________________________ > >> > Neo mailing list > >> > User@lists.neo4j.org > >> > https://lists.neo4j.org/mailman/listinfo/user > >> > > >> > >> > >> > >> -- > >> Mattias Persson, [matt...@neotechnology.com] > >> Neo Technology, www.neotechnology.com > >> _______________________________________________ > >> Neo mailing list > >> User@lists.neo4j.org > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > > > _______________________________________________ > > Neo mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Neo Technology, www.neotechnology.com > _______________________________________________ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user >
_______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user