Re: [Neo4j] How to create a graph database out of a huge dataset?

2011-07-19 Thread Craig Taverner
I'm not sure it's such a good idea to call tx.success() on every iteration of the loop. I suggest call it only in the commit, and after the loop (ie. move it two lines down). Also I think a commit size of 50k it a little large. You're probably not going to see much improvement past 10k. In fact I

Re: [Neo4j] How to create a graph database out of a huge dataset?

2011-07-19 Thread st3ven
OK, I changed that and will test it if it improves the runtime. Btw. I also changed my timestamp String into a long to reduce the size of my database. Hope to get some tips about faster parsing or optimizing my CSV-file from you guys soon. Cheers Stephan -- View this message in context:

Re: [Neo4j] How to create a graph database out of a huge dataset?

2011-07-18 Thread st3ven
Hello Michael, I got the zipfile from here http://download.wikimedia.org/enwiki/20110526/enwiki-20110526-stub-meta-history.xml.gz http://download.wikimedia.org/enwiki/20110526/enwiki-20110526-stub-meta-history.xml.gz . The unzipped file is a XML-file and I extracted the important informations

[Neo4j] How to create a graph database out of a huge dataset?

2011-07-17 Thread st3ven
Hi all, I'm new to neo4j and graph databases. To create my graph database I got two questions for you: 1. I want to create a graph database out of a huge CSV file. The problem is, that i need to index the nodes I have already created, so that I don't create duplicate nodes. My CSV file looks

Re: [Neo4j] How to create a graph database out of a huge dataset?

2011-07-17 Thread Michael Hunger
Stephan, This is a common thing when inserting data. You should be able to use lucene in both settings (6M authors is not that much). Please have a look at your heap memory settings (and in transactional mode also your memory-map settings for neo4j). For batch inserter. You can query the

Re: [Neo4j] How to create a graph database out of a huge dataset?

2011-07-17 Thread st3ven
Hi, thanks for your fast answer. Right now I'm using lucene for 6M authors, but my whole dataset consists of nearly 25M authors. Can i use lucene there also, because I think this getting really slow to check if a user already exists. How can I change my heap memory settings and my memory-map

Re: [Neo4j] How to create a graph database out of a huge dataset?

2011-07-17 Thread Michael Hunger
Stephan, can you perhaps share your csv file or give at least a few sample lines and a typical distribution (articles per author etc). You tested this with 20M arcticles and 6M authors? What is the current runtime of that import with which kind of hardware? (when working on a similar test I