Re: [Neo4j] Neo4J Batch Inserter is slow with big ids

2015-07-16 Thread Alberto Jesús Rubio Sánchez
Michael, sorry to answer so late. After following your instructions I got that performance was acceptable. Thank you very much for your help. Regards, Alberto. On Monday, March 30, 2015 at 8:13:36 AM UTC+2, Michael Hunger wrote: > > That's what I said. > > Use an effective cache (i.e. one of th

Re: [Neo4j] Neo4J Batch Inserter is slow with big ids

2015-03-29 Thread Michael Hunger
That's what I said. Use an effective cache (i.e. one of the primitive collection libraries with a map from long -> long) Most memory efficient and performant way: Alternatively, what you do is to do a dual-pass. Create an Array of the expected sizes, add the key entries to the array. Sort the

Re: [Neo4j] Neo4J Batch Inserter is slow with big ids

2015-03-29 Thread Alberto Jesús Rubio Sánchez
Hi Michael, I've been testing and my problem is that the file is very large and the memory becomes full. For this reason I thought to use a cache to store the ids. If a node id isn't in the cache, the node is inserted even if the node is in the database. Finally look for duplicate nodes remai

Re: [Neo4j] Neo4J Batch Inserter is slow with big ids

2015-03-15 Thread Michael Hunger
I would recommend that you check out the Neo4j-Import tool of Neo4j 2.2 Alterantively, what I do is to do a dual-pass. Create an Array of the expected sizes, add the key entries to the array. Sort the array The keys are entries of the array and the array-index is the node-id. you can scan the arr

Re: [Neo4j] Neo4J Batch Inserter is slow with big ids

2015-03-15 Thread Alberto Jesús Rubio Sánchez
Hi Michael, Thanks for the reply :) I used the map to keep the map identifiers but the data files are very large and memory overflowed. I should be cleaning the map every X insertions and then make a second pass to delete duplicates. Perhaps the best option is to use the next version. What do

Re: [Neo4j] Neo4J Batch Inserter is slow with big ids

2015-03-12 Thread Michael Hunger
You should let Neo assign the ID's and keep a mapping between your domain id's and the Neo4j internal ID in a efficient dictionary/map. Also you don't configure and memory configuration when creating the batch-insert, so it will go with the default memory config, see: http://neo4j.com/docs/st

Re: [Neo4j] Neo4J Batch Inserter is slow with big ids

2015-03-12 Thread Alberto Jesús Rubio Sánchez
Hi Michael! This is an example of the code I use: import org.neo4j.unsafe.batchinsert.BatchInserter; import org.neo4j.unsafe.batchinsert.BatchInserters; BatchInserter batchInserter = BatchInserters.inserter(DB_PATH); batchInserter.createDeferredSchemaIndex(NODE_LABEL).on("id").create(); ba

Re: [Neo4j] Neo4J Batch Inserter is slow with big ids

2015-03-11 Thread Michael Hunger
Which batch-inserter are you using? I recommend looking into the new neo4j-import tool which has good support for external id-linking. neo4j.com/docs/milestone/import-tool.html In the next release it will also be able to handle duplicates well. Michael > Am 11.03.2015 um 20:27 schrieb Alberto

[Neo4j] Neo4J Batch Inserter is slow with big ids

2015-03-11 Thread Alberto Jesús Rubio Sánchez
Hi, I'm working on an RDF files importer but I have a problem, my data files have duplicate nodes. For this reason I use big ids to insert the nodes using the batch inserter but the proccess is slow. I have seen this post