Hello neo4j-comunity,
I am creating a graph database for a social network. To create the graph database I am using the Batch Inserter. The Batch Inserter inserts data from 2 files into the graph database. Files: 1. the first file contains the Nodes I want to create (about 3.5M Nodes) The file looks like this: Author 1 Author 2 Author 2 ... 2. the second file contains every Relationship between the Nodes (about 2.5 billion Relationships) This file looks like this: Author1; Author2; timestamp Author2; Author3; timestamp Author1; Author3; timestamp... The specifications of my Computer look like this: Intel Core i7 3,4Ghz 16GB Ram Geforce GT 420 1GB 2TB harddrive My Code to create the graph database looks like this: package wikiOSN; import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.util.Map; import org.neo4j.graphdb.DynamicRelationshipType; import org.neo4j.graphdb.index.BatchInserterIndex; import org.neo4j.graphdb.index.BatchInserterIndexProvider; import org.neo4j.helpers.collection.MapUtil; import org.neo4j.index.impl.lucene.LuceneBatchInserterIndexProvider; import org.neo4j.kernel.impl.batchinsert.BatchInserter; import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl; public class CreateAndConnectNodes { public static void main(String[] args) throws IOException { BufferedReader bf = new BufferedReader(new FileReader( "/media/sdg1/Wikipedia/Reduced Files/autoren-der-wikiartikel")); BufferedReader bf2 = new BufferedReader(new FileReader( "/media/sdg1/Wikipedia/Reduced Files/wikipedia-output")); CreateAndConnectNodes cacn = new CreateAndConnectNodes(); cacn.createGraphDatabase(bf, bf2); } private long relationCounter = 0; private void createGraphDatabase(BufferedReader bf, BufferedReader bf2) throws IOException { BatchInserter inserter = new BatchInserterImpl( "target/socialNetwork-batchinsert"); BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider( inserter); BatchInserterIndex authors = indexProvider.nodeIndex("author", MapUtil.stringMap("type", "exact")); authors.setCacheCapacity("name", 100000); String zeile; String zeile2; while ((zeile = bf.readLine()) != null) { Map<String, Object> properties = MapUtil.map("name", zeile); long node = inserter.createNode(properties); authors.add(node, properties); } bf.close(); System.out.println("Nodes created!"); authors.flush(); String node = ""; long node1 = 0; long node2 = 0; while ((zeile2 = bf2.readLine()) != null) { if (relationCounter++ % 100000000 == 0) { System.out .println("Edges already created: " + relationCounter); } String[] relation = zeile2.split("%;% "); if (node == "") { node = relation[0]; if (authors.get("name", relation[0]).getSingle() != null) { node1 = authors.get("name", relation[0]).getSingle(); } else { System.out.println("Autor 1: " + relation[0]); break; } } if (!node.equals(relation[0])) { node = relation[0]; if (authors.get("name", relation[0]).getSingle() != null) { node1 = authors.get("name", relation[0]).getSingle(); } else { System.out.println("Autor 1: " + relation[0]); break; } } if (authors.get("name", relation[1]).getSingle() != null) { node2 = authors.get("name", relation[1]).getSingle(); } else { System.out.println("Autor 2: " + relation[1]); break; } Map<String, Object> properties = MapUtil.map("timestamp", Long.parseLong(relation[2].trim())); inserter.createRelationship(node1, node2, DynamicRelationshipType.withName("KNOWS"), properties); } System.out.println("Edges created!!!"); bf2.close(); indexProvider.shutdown(); inserter.shutdown(); } } I want to know if there is any better way to create such a big database or am I doing it correctly? Can I maybe optimize the import for traversals I want to do or is this the standard sort of import? The Java heapsize for the insert here was -Xmx8G. After I had created the graph database I wanted to get the node degree of every node. To get the node degree I created the following code: package wikiOSN; import java.io.IOException; import java.util.Date; import java.util.Iterator; import java.util.Map; import org.neo4j.graphdb.GraphDatabaseService; import org.neo4j.graphdb.Node; import org.neo4j.graphdb.Relationship; import org.neo4j.kernel.EmbeddedGraphDatabase; public class NodeDegree { public static void main(String[] args) throws IOException { NodeDegree nd = new NodeDegree(); nd.getNodeDegree(); System.out.println("NodeDegree calculated!!!"); Date date = new Date(); date.setTime(System.currentTimeMillis()); System.out.println(date); } private GraphDatabaseService db; private int counter; private void getNodeDegree() throws IOException { db = new EmbeddedGraphDatabase("target/socialNetwork-batchinsert"); for (Node node : db.getAllNodes()) { counter = 0; if (node.getId() > 0) { for (Relationship rel : node.getRelationships()) { counter++; } System.out.println(node.getProperty("name").toString() + ": " + counter); } } db.shutdown(); } } The problem here is, that after 3 days I only got the node degree for 80000 nodes. That is a huge amount of time for only 80000 nodes. What am I doing wrong here? I also tried to tune my traversal, but it is still very slow. How can I optimize that, so that I get the node degree only in one day for 3.5M nodes? Do I have to change something at the import of the data or is there a better way for getting the node degree? Thank you very much for your help! Greetings, Stephan -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351599.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user