Hi again, Núria (it was I, Mattias who asked for the sample code).
Well... the fact that you parse 4 csv files doesn't really help me
setup a test for this... I mean how can I know that my test will be
similar to yours? Would it be ok to attach your code/csv files as
well?

/ Mattias

2009/12/9 Núria Trench <nuriatre...@gmail.com>:
> Hi Todd,
>
> The sample code creates nodes and relationships by parsing 4 csv files.
> Thank you for trying to trigger this behaviour with this sample.
>
> Núria
>
> 2009/12/9 Mattias Persson <matt...@neotechnology.com>
>
>> Could you provide me with some sample code which can trigger this
>> behaviour with the latest index-util-0.9-SNAPSHOT Núria?
>>
>> 2009/12/9 Núria Trench <nuriatre...@gmail.com>:
>> > Todd,
>> >
>> > I haven't the same problem. In my case, after indexing all the
>> > attributes/properties of each node, the application creates all the edges
>> by
>> > looking up the tail node and the head node. So, it calls the method
>> > "org.neo4j.util.index.
>> > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found
>> node)
>> > in many occasions.
>> >
>> > Any one has an alternative to get a node with indexex
>> attributes/properties?
>> >
>> > Thank you,
>> >
>> > Núria.
>> >
>> >
>> > 2009/12/7 Mattias Persson <matt...@neotechnology.com>
>> >
>> >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
>> >> is a bug that we fixed yesterday... (assuming it's the same bug).
>> >>
>> >> 2009/12/7 Todd Stavish <toddstav...@gmail.com>:
>> >> > Hi Mattias, Núria.
>> >> >
>> >> > I am also running into scalability problems with the Lucene batch
>> >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
>> >> > calling optimize more. Increasing ulimit didn't help.
>> >> >
>> >> > INFO] Exception in thread "main" java.lang.RuntimeException:
>> >> > java.io.FileNotFoundException:
>> >> >
>> >>
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> >> > (Too many open files)
>> >> > [INFO]  at
>> >>
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
>> >> > [INFO]  at
>> >>
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
>> >> > [INFO]  at
>> >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
>> >> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
>> >> > [INFO] Caused by: java.io.FileNotFoundException:
>> >> >
>> >>
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> >> > (Too many open files)
>> >> >
>> >> > I tried breaking up to separate batchinserter instances, and it hangs
>> >> > now. Can I create more than one batch inserter per process if they run
>> >> > sequentially and non-threaded?
>> >> >
>> >> > Thanks,
>> >> > Todd
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <nuriatre...@gmail.com>
>> >> wrote:
>> >> >> Hi again Mattias,
>> >> >>
>> >> >> I have tried to execute my application with the last version
>> available
>> >> in
>> >> >> the maven repository and I still have the same problem. After
>> creating
>> >> and
>> >> >> indexing all the nodes, the application calls the "optimize" method
>> and,
>> >> >> then, it creates all the edges by calling the method "getNodes" in
>> order
>> >> to
>> >> >> select the tail and head node of the edge, but it doesn't work
>> because
>> >> many
>> >> >> nodes are not found.
>> >> >>
>> >> >> I have tried to create only 30 nodes and 15 edges and it works
>> properly,
>> >> but
>> >> >> if I try to create a big graph (180 million edges + 20 million nodes)
>> it
>> >> >> doesn't.
>> >> >>
>> >> >> I have also tried to call the "optimize" method every time the
>> >> application
>> >> >> has been created 1 million nodes but it doesn't work.
>> >> >>
>> >> >> Have you tried to create as many nodes as I have said with the newer
>> >> >> index-util version?
>> >> >>
>> >> >> Thank you,
>> >> >>
>> >> >> Núria.
>> >> >>
>> >> >> 2009/12/4 Núria Trench <nuriatre...@gmail.com>
>> >> >>
>> >> >>> Hi Mattias,
>> >> >>>
>> >> >>> Thank you very much for fixing the problem so fast. I will try it as
>> >> soon
>> >> >>> as the new changes will be available in the maven repository.
>> >> >>>
>> >> >>> Núria.
>> >> >>>
>> >> >>>
>> >> >>> 2009/12/4 Mattias Persson <matt...@neotechnology.com>
>> >> >>>
>> >> >>>> I fixed the problem and also added a cache per key for faster
>> >> >>>> getNodes/getSingleNode lookup during the insert process. However
>> the
>> >> >>>> cache assumes that there's nothing in the index when the process
>> >> >>>> starts (which almost always will be true) to speed things up even
>> >> >>>> further.
>> >> >>>>
>> >> >>>> You can control the cache size and if it should be used by
>> overriding
>> >> >>>> the (this is also documented in the Javadoc):
>> >> >>>>
>> >> >>>> boolean useCache()
>> >> >>>> int getMaxCacheSizePerKey()
>> >> >>>>
>> >> >>>> methods in your LuceneIndexBatchInserterImpl instance. The new
>> changes
>> >> >>>> should be available in the maven repository within an hour.
>> >> >>>>
>> >> >>>> 2009/12/4 Mattias Persson <matt...@neotechnology.com>:
>> >> >>>> > I think I found the problem... it's indexing as it should, but it
>> >> >>>> > isn't reflected in getNodes/getSingleNode properly until you
>> >> >>>> > flush/optimize/shutdown the index. I'll try to fix it today!
>> >> >>>> >
>> >> >>>> > 2009/12/3 Núria Trench <nuriatre...@gmail.com>:
>> >> >>>> >> Thank you very much for your response.
>> >> >>>> >> If you need more information, you only have to send an e-mail
>> and I
>> >> >>>> will try
>> >> >>>> >> to explain it better.
>> >> >>>> >>
>> >> >>>> >> Núria.
>> >> >>>> >>
>> >> >>>> >> 2009/12/3 Mattias Persson <matt...@neotechnology.com>
>> >> >>>> >>
>> >> >>>> >>> This is something I'd like to reproduce and I'll do some
>> testing
>> >> on
>> >> >>>> >>> this tomorrow
>> >> >>>> >>>
>> >> >>>> >>> 2009/12/3 Núria Trench <nuriatre...@gmail.com>:
>> >> >>>> >>> > Hello,
>> >> >>>> >>> >
>> >> >>>> >>> > Last week, I decided to download your graph database core in
>> >> order
>> >> >>>> to use
>> >> >>>> >>> > it. First, I created a new project to parse my CSV files and
>> >> create
>> >> >>>> a new
>> >> >>>> >>> > graph database with Neo4j. This CSV files contain 150 milion
>> >> edges
>> >> >>>> and 20
>> >> >>>> >>> > milion nodes.
>> >> >>>> >>> >
>> >> >>>> >>> > When I finished to write the code which will create the graph
>> >> >>>> database, I
>> >> >>>> >>> > executed it and, after six hours of execution, the program
>> >> crashes
>> >> >>>> >>> because
>> >> >>>> >>> > of a Lucene exception. The exception is related to the index
>> >> merging
>> >> >>>> and
>> >> >>>> >>> it
>> >> >>>> >>> > has the following message:
>> >> >>>> >>> > "mergeFields produced an invalid result: docCount is
>> 385282378
>> >> but
>> >> >>>> fdx
>> >> >>>> >>> file
>> >> >>>> >>> > size is 3082259028; now aborting this merge to prevent index
>> >> >>>> corruption"
>> >> >>>> >>> >
>> >> >>>> >>> > I have searched on the net and I found that it is a lucene
>> bug.
>> >> The
>> >> >>>> >>> > libraries used for executing my project were:
>> >> >>>> >>> > neo-1.0-b10
>> >> >>>> >>> > index-util-0.7
>> >> >>>> >>> > lucene-core-2.4.0
>> >> >>>> >>> >
>> >> >>>> >>> > So, I decided to use a newer Lucene version. I found that you
>> >> have a
>> >> >>>> >>> newer
>> >> >>>> >>> > index-util version so I updated the libraries:
>> >> >>>> >>> > neo-1.0-b10
>> >> >>>> >>> > index-util-0.9
>> >> >>>> >>> > lucene-core-2.9.1
>> >> >>>> >>> >
>> >> >>>> >>> > When I had updated those libraries, I tried to execute my
>> >> project
>> >> >>>> again
>> >> >>>> >>> and
>> >> >>>> >>> > I found that, in many occassions, it was not indexing
>> properly.
>> >> So,
>> >> >>>> I
>> >> >>>> >>> tried
>> >> >>>> >>> > to optimize the index after every time I indexed something.
>> This
>> >> was
>> >> >>>> a
>> >> >>>> >>> > solution because, after that, it was indexing properly but
>> the
>> >> time
>> >> >>>> >>> > execution increased a lot.
>> >> >>>> >>> >
>> >> >>>> >>> > I am not using transactions, instead of this, I am using the
>> >> Batch
>> >> >>>> >>> Inserter
>> >> >>>> >>> > with the LuceneIndexBatchInserter.
>> >> >>>> >>> >
>> >> >>>> >>> > So, my question is: What can I do to solve this problem? If
>> use
>> >> >>>> >>> > index-util-0.7 I cannot finish the execution of creating the
>> >> graph
>> >> >>>> >>> database
>> >> >>>> >>> > and I use index-util-0.9 I have to optimize the index in
>> every
>> >> >>>> insertion
>> >> >>>> >>> and
>> >> >>>> >>> > the execution never ever ends.
>> >> >>>> >>> >
>> >> >>>> >>> > Thank you very much in advance,
>> >> >>>> >>> >
>> >> >>>> >>> > Núria.
>> >> >>>> >>> > _______________________________________________
>> >> >>>> >>> > Neo mailing list
>> >> >>>> >>> > User@lists.neo4j.org
>> >> >>>> >>> > https://lists.neo4j.org/mailman/listinfo/user
>> >> >>>> >>> >
>> >> >>>> >>>
>> >> >>>> >>>
>> >> >>>> >>>
>> >> >>>> >>> --
>> >> >>>> >>> Mattias Persson, [matt...@neotechnology.com]
>> >> >>>> >>> Neo Technology, www.neotechnology.com
>> >> >>>> >>> _______________________________________________
>> >> >>>> >>> Neo mailing list
>> >> >>>> >>> User@lists.neo4j.org
>> >> >>>> >>> https://lists.neo4j.org/mailman/listinfo/user
>> >> >>>> >>>
>> >> >>>> >> _______________________________________________
>> >> >>>> >> Neo mailing list
>> >> >>>> >> User@lists.neo4j.org
>> >> >>>> >> https://lists.neo4j.org/mailman/listinfo/user
>> >> >>>> >>
>> >> >>>> >
>> >> >>>> >
>> >> >>>> >
>> >> >>>> > --
>> >> >>>> > Mattias Persson, [matt...@neotechnology.com]
>> >> >>>> > Neo Technology, www.neotechnology.com
>> >> >>>> >
>> >> >>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> --
>> >> >>>> Mattias Persson, [matt...@neotechnology.com]
>> >> >>>> Neo Technology, www.neotechnology.com
>> >> >>>> _______________________________________________
>> >> >>>> Neo mailing list
>> >> >>>> User@lists.neo4j.org
>> >> >>>> https://lists.neo4j.org/mailman/listinfo/user
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >> _______________________________________________
>> >> >> Neo mailing list
>> >> >> User@lists.neo4j.org
>> >> >> https://lists.neo4j.org/mailman/listinfo/user
>> >> >>
>> >> > _______________________________________________
>> >> > Neo mailing list
>> >> > User@lists.neo4j.org
>> >> > https://lists.neo4j.org/mailman/listinfo/user
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Mattias Persson, [matt...@neotechnology.com]
>> >> Neo Technology, www.neotechnology.com
>> >> _______________________________________________
>> >> Neo mailing list
>> >> User@lists.neo4j.org
>> >> https://lists.neo4j.org/mailman/listinfo/user
>> >>
>> > _______________________________________________
>> > Neo mailing list
>> > User@lists.neo4j.org
>> > https://lists.neo4j.org/mailman/listinfo/user
>> >
>>
>>
>>
>> --
>> Mattias Persson, [matt...@neotechnology.com]
>> Neo Technology, www.neotechnology.com
>> _______________________________________________
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to