Re: [Neo] LuceneIndexBatchInserter doubt

Mattias Persson Mon, 07 Dec 2009 05:37:39 -0800

Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
is a bug that we fixed yesterday... (assuming it's the same bug).


2009/12/7 Todd Stavish <toddstav...@gmail.com>:
> Hi Mattias, Núria.
>
> I am also running into scalability problems with the Lucene batch
> inserter at much smaller numbers, 30,000 indexed nodes. I tried
> calling optimize more. Increasing ulimit didn't help.
>
> INFO] Exception in thread "main" java.lang.RuntimeException:
> java.io.FileNotFoundException:
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> (Too many open files)
> [INFO]  at 
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
> [INFO]  at 
> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
> [INFO]  at 
> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
> [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
> [INFO] Caused by: java.io.FileNotFoundException:
> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
> (Too many open files)
>
> I tried breaking up to separate batchinserter instances, and it hangs
> now. Can I create more than one batch inserter per process if they run
> sequentially and non-threaded?
>
> Thanks,
> Todd
>
>
>
>
>
> On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <nuriatre...@gmail.com> wrote:
>> Hi again Mattias,
>>
>> I have tried to execute my application with the last version available in
>> the maven repository and I still have the same problem. After creating and
>> indexing all the nodes, the application calls the "optimize" method and,
>> then, it creates all the edges by calling the method "getNodes" in order to
>> select the tail and head node of the edge, but it doesn't work because many
>> nodes are not found.
>>
>> I have tried to create only 30 nodes and 15 edges and it works properly, but
>> if I try to create a big graph (180 million edges + 20 million nodes) it
>> doesn't.
>>
>> I have also tried to call the "optimize" method every time the application
>> has been created 1 million nodes but it doesn't work.
>>
>> Have you tried to create as many nodes as I have said with the newer
>> index-util version?
>>
>> Thank you,
>>
>> Núria.
>>
>> 2009/12/4 Núria Trench <nuriatre...@gmail.com>
>>
>>> Hi Mattias,
>>>
>>> Thank you very much for fixing the problem so fast. I will try it as soon
>>> as the new changes will be available in the maven repository.
>>>
>>> Núria.
>>>
>>>
>>> 2009/12/4 Mattias Persson <matt...@neotechnology.com>
>>>
>>>> I fixed the problem and also added a cache per key for faster
>>>> getNodes/getSingleNode lookup during the insert process. However the
>>>> cache assumes that there's nothing in the index when the process
>>>> starts (which almost always will be true) to speed things up even
>>>> further.
>>>>
>>>> You can control the cache size and if it should be used by overriding
>>>> the (this is also documented in the Javadoc):
>>>>
>>>> boolean useCache()
>>>> int getMaxCacheSizePerKey()
>>>>
>>>> methods in your LuceneIndexBatchInserterImpl instance. The new changes
>>>> should be available in the maven repository within an hour.
>>>>
>>>> 2009/12/4 Mattias Persson <matt...@neotechnology.com>:
>>>> > I think I found the problem... it's indexing as it should, but it
>>>> > isn't reflected in getNodes/getSingleNode properly until you
>>>> > flush/optimize/shutdown the index. I'll try to fix it today!
>>>> >
>>>> > 2009/12/3 Núria Trench <nuriatre...@gmail.com>:
>>>> >> Thank you very much for your response.
>>>> >> If you need more information, you only have to send an e-mail and I
>>>> will try
>>>> >> to explain it better.
>>>> >>
>>>> >> Núria.
>>>> >>
>>>> >> 2009/12/3 Mattias Persson <matt...@neotechnology.com>
>>>> >>
>>>> >>> This is something I'd like to reproduce and I'll do some testing on
>>>> >>> this tomorrow
>>>> >>>
>>>> >>> 2009/12/3 Núria Trench <nuriatre...@gmail.com>:
>>>> >>> > Hello,
>>>> >>> >
>>>> >>> > Last week, I decided to download your graph database core in order
>>>> to use
>>>> >>> > it. First, I created a new project to parse my CSV files and create
>>>> a new
>>>> >>> > graph database with Neo4j. This CSV files contain 150 milion edges
>>>> and 20
>>>> >>> > milion nodes.
>>>> >>> >
>>>> >>> > When I finished to write the code which will create the graph
>>>> database, I
>>>> >>> > executed it and, after six hours of execution, the program crashes
>>>> >>> because
>>>> >>> > of a Lucene exception. The exception is related to the index merging
>>>> and
>>>> >>> it
>>>> >>> > has the following message:
>>>> >>> > "mergeFields produced an invalid result: docCount is 385282378 but
>>>> fdx
>>>> >>> file
>>>> >>> > size is 3082259028; now aborting this merge to prevent index
>>>> corruption"
>>>> >>> >
>>>> >>> > I have searched on the net and I found that it is a lucene bug. The
>>>> >>> > libraries used for executing my project were:
>>>> >>> > neo-1.0-b10
>>>> >>> > index-util-0.7
>>>> >>> > lucene-core-2.4.0
>>>> >>> >
>>>> >>> > So, I decided to use a newer Lucene version. I found that you have a
>>>> >>> newer
>>>> >>> > index-util version so I updated the libraries:
>>>> >>> > neo-1.0-b10
>>>> >>> > index-util-0.9
>>>> >>> > lucene-core-2.9.1
>>>> >>> >
>>>> >>> > When I had updated those libraries, I tried to execute my project
>>>> again
>>>> >>> and
>>>> >>> > I found that, in many occassions, it was not indexing properly. So,
>>>> I
>>>> >>> tried
>>>> >>> > to optimize the index after every time I indexed something. This was
>>>> a
>>>> >>> > solution because, after that, it was indexing properly but the time
>>>> >>> > execution increased a lot.
>>>> >>> >
>>>> >>> > I am not using transactions, instead of this, I am using the Batch
>>>> >>> Inserter
>>>> >>> > with the LuceneIndexBatchInserter.
>>>> >>> >
>>>> >>> > So, my question is: What can I do to solve this problem? If use
>>>> >>> > index-util-0.7 I cannot finish the execution of creating the graph
>>>> >>> database
>>>> >>> > and I use index-util-0.9 I have to optimize the index in every
>>>> insertion
>>>> >>> and
>>>> >>> > the execution never ever ends.
>>>> >>> >
>>>> >>> > Thank you very much in advance,
>>>> >>> >
>>>> >>> > Núria.
>>>> >>> > _______________________________________________
>>>> >>> > Neo mailing list
>>>> >>> > User@lists.neo4j.org
>>>> >>> > https://lists.neo4j.org/mailman/listinfo/user
>>>> >>> >
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> Mattias Persson, [matt...@neotechnology.com]
>>>> >>> Neo Technology, www.neotechnology.com
>>>> >>> _______________________________________________
>>>> >>> Neo mailing list
>>>> >>> User@lists.neo4j.org
>>>> >>> https://lists.neo4j.org/mailman/listinfo/user
>>>> >>>
>>>> >> _______________________________________________
>>>> >> Neo mailing list
>>>> >> User@lists.neo4j.org
>>>> >> https://lists.neo4j.org/mailman/listinfo/user
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Mattias Persson, [matt...@neotechnology.com]
>>>> > Neo Technology, www.neotechnology.com
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Mattias Persson, [matt...@neotechnology.com]
>>>> Neo Technology, www.neotechnology.com
>>>> _______________________________________________
>>>> Neo mailing list
>>>> User@lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>
>>>
>>>
>> _______________________________________________
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] LuceneIndexBatchInserter doubt

Reply via email to