Todd,

I haven't the same problem. In my case, after indexing all the
attributes/properties of each node, the application creates all the edges by
looking up the tail node and the head node. So, it calls the method
"org.neo4j.util.index.
LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node)
in many occasions.

Any one has an alternative to get a node with indexex attributes/properties?

Thank you,

Núria.


2009/12/7 Mattias Persson <matt...@neotechnology.com>

> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
> is a bug that we fixed yesterday... (assuming it's the same bug).
>
> 2009/12/7 Todd Stavish <toddstav...@gmail.com>:
> > Hi Mattias, Núria.
> >
> > I am also running into scalability problems with the Lucene batch
> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
> > calling optimize more. Increasing ulimit didn't help.
> >
> > INFO] Exception in thread "main" java.lang.RuntimeException:
> > java.io.FileNotFoundException:
> >
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> > (Too many open files)
> > [INFO]  at
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> > [INFO]  at
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> > [INFO]  at
> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> > [INFO] Caused by: java.io.FileNotFoundException:
> >
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> > (Too many open files)
> >
> > I tried breaking up to separate batchinserter instances, and it hangs
> > now. Can I create more than one batch inserter per process if they run
> > sequentially and non-threaded?
> >
> > Thanks,
> > Todd
> >
> >
> >
> >
> >
> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <nuriatre...@gmail.com>
> wrote:
> >> Hi again Mattias,
> >>
> >> I have tried to execute my application with the last version available
> in
> >> the maven repository and I still have the same problem. After creating
> and
> >> indexing all the nodes, the application calls the "optimize" method and,
> >> then, it creates all the edges by calling the method "getNodes" in order
> to
> >> select the tail and head node of the edge, but it doesn't work because
> many
> >> nodes are not found.
> >>
> >> I have tried to create only 30 nodes and 15 edges and it works properly,
> but
> >> if I try to create a big graph (180 million edges + 20 million nodes) it
> >> doesn't.
> >>
> >> I have also tried to call the "optimize" method every time the
> application
> >> has been created 1 million nodes but it doesn't work.
> >>
> >> Have you tried to create as many nodes as I have said with the newer
> >> index-util version?
> >>
> >> Thank you,
> >>
> >> Núria.
> >>
> >> 2009/12/4 Núria Trench <nuriatre...@gmail.com>
> >>
> >>> Hi Mattias,
> >>>
> >>> Thank you very much for fixing the problem so fast. I will try it as
> soon
> >>> as the new changes will be available in the maven repository.
> >>>
> >>> Núria.
> >>>
> >>>
> >>> 2009/12/4 Mattias Persson <matt...@neotechnology.com>
> >>>
> >>>> I fixed the problem and also added a cache per key for faster
> >>>> getNodes/getSingleNode lookup during the insert process. However the
> >>>> cache assumes that there's nothing in the index when the process
> >>>> starts (which almost always will be true) to speed things up even
> >>>> further.
> >>>>
> >>>> You can control the cache size and if it should be used by overriding
> >>>> the (this is also documented in the Javadoc):
> >>>>
> >>>> boolean useCache()
> >>>> int getMaxCacheSizePerKey()
> >>>>
> >>>> methods in your LuceneIndexBatchInserterImpl instance. The new changes
> >>>> should be available in the maven repository within an hour.
> >>>>
> >>>> 2009/12/4 Mattias Persson <matt...@neotechnology.com>:
> >>>> > I think I found the problem... it's indexing as it should, but it
> >>>> > isn't reflected in getNodes/getSingleNode properly until you
> >>>> > flush/optimize/shutdown the index. I'll try to fix it today!
> >>>> >
> >>>> > 2009/12/3 Núria Trench <nuriatre...@gmail.com>:
> >>>> >> Thank you very much for your response.
> >>>> >> If you need more information, you only have to send an e-mail and I
> >>>> will try
> >>>> >> to explain it better.
> >>>> >>
> >>>> >> Núria.
> >>>> >>
> >>>> >> 2009/12/3 Mattias Persson <matt...@neotechnology.com>
> >>>> >>
> >>>> >>> This is something I'd like to reproduce and I'll do some testing
> on
> >>>> >>> this tomorrow
> >>>> >>>
> >>>> >>> 2009/12/3 Núria Trench <nuriatre...@gmail.com>:
> >>>> >>> > Hello,
> >>>> >>> >
> >>>> >>> > Last week, I decided to download your graph database core in
> order
> >>>> to use
> >>>> >>> > it. First, I created a new project to parse my CSV files and
> create
> >>>> a new
> >>>> >>> > graph database with Neo4j. This CSV files contain 150 milion
> edges
> >>>> and 20
> >>>> >>> > milion nodes.
> >>>> >>> >
> >>>> >>> > When I finished to write the code which will create the graph
> >>>> database, I
> >>>> >>> > executed it and, after six hours of execution, the program
> crashes
> >>>> >>> because
> >>>> >>> > of a Lucene exception. The exception is related to the index
> merging
> >>>> and
> >>>> >>> it
> >>>> >>> > has the following message:
> >>>> >>> > "mergeFields produced an invalid result: docCount is 385282378
> but
> >>>> fdx
> >>>> >>> file
> >>>> >>> > size is 3082259028; now aborting this merge to prevent index
> >>>> corruption"
> >>>> >>> >
> >>>> >>> > I have searched on the net and I found that it is a lucene bug.
> The
> >>>> >>> > libraries used for executing my project were:
> >>>> >>> > neo-1.0-b10
> >>>> >>> > index-util-0.7
> >>>> >>> > lucene-core-2.4.0
> >>>> >>> >
> >>>> >>> > So, I decided to use a newer Lucene version. I found that you
> have a
> >>>> >>> newer
> >>>> >>> > index-util version so I updated the libraries:
> >>>> >>> > neo-1.0-b10
> >>>> >>> > index-util-0.9
> >>>> >>> > lucene-core-2.9.1
> >>>> >>> >
> >>>> >>> > When I had updated those libraries, I tried to execute my
> project
> >>>> again
> >>>> >>> and
> >>>> >>> > I found that, in many occassions, it was not indexing properly.
> So,
> >>>> I
> >>>> >>> tried
> >>>> >>> > to optimize the index after every time I indexed something. This
> was
> >>>> a
> >>>> >>> > solution because, after that, it was indexing properly but the
> time
> >>>> >>> > execution increased a lot.
> >>>> >>> >
> >>>> >>> > I am not using transactions, instead of this, I am using the
> Batch
> >>>> >>> Inserter
> >>>> >>> > with the LuceneIndexBatchInserter.
> >>>> >>> >
> >>>> >>> > So, my question is: What can I do to solve this problem? If use
> >>>> >>> > index-util-0.7 I cannot finish the execution of creating the
> graph
> >>>> >>> database
> >>>> >>> > and I use index-util-0.9 I have to optimize the index in every
> >>>> insertion
> >>>> >>> and
> >>>> >>> > the execution never ever ends.
> >>>> >>> >
> >>>> >>> > Thank you very much in advance,
> >>>> >>> >
> >>>> >>> > Núria.
> >>>> >>> > _______________________________________________
> >>>> >>> > Neo mailing list
> >>>> >>> > User@lists.neo4j.org
> >>>> >>> > https://lists.neo4j.org/mailman/listinfo/user
> >>>> >>> >
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>> --
> >>>> >>> Mattias Persson, [matt...@neotechnology.com]
> >>>> >>> Neo Technology, www.neotechnology.com
> >>>> >>> _______________________________________________
> >>>> >>> Neo mailing list
> >>>> >>> User@lists.neo4j.org
> >>>> >>> https://lists.neo4j.org/mailman/listinfo/user
> >>>> >>>
> >>>> >> _______________________________________________
> >>>> >> Neo mailing list
> >>>> >> User@lists.neo4j.org
> >>>> >> https://lists.neo4j.org/mailman/listinfo/user
> >>>> >>
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Mattias Persson, [matt...@neotechnology.com]
> >>>> > Neo Technology, www.neotechnology.com
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Mattias Persson, [matt...@neotechnology.com]
> >>>> Neo Technology, www.neotechnology.com
> >>>> _______________________________________________
> >>>> Neo mailing list
> >>>> User@lists.neo4j.org
> >>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>
> >>>
> >>>
> >> _______________________________________________
> >> Neo mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > _______________________________________________
> > Neo mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Neo Technology, www.neotechnology.com
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to