[Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

st3ven Tue, 20 Sep 2011 03:23:34 -0700

Hello neo4j-comunity,



I am creating a graph database for a social network.

To create the graph database I am using the Batch Inserter.

The Batch Inserter inserts data from 2 files into the graph database.



Files:

1. the first file contains the Nodes I want to create (about 3.5M Nodes)

The file looks like this:
Author 1
Author 2
Author 2 ...

2. the second file contains every Relationship between the Nodes (about 2.5
billion Relationships)


This file looks like this:
Author1; Author2; timestamp
Author2; Author3; timestamp
Author1; Author3; timestamp...

The specifications of my Computer look like this:



Intel Core i7 3,4Ghz

16GB Ram

Geforce GT 420 1GB

2TB harddrive



My Code to create the graph database looks like this:



package wikiOSN;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map;

import org.neo4j.graphdb.DynamicRelationshipType;
import org.neo4j.graphdb.index.BatchInserterIndex;
import org.neo4j.graphdb.index.BatchInserterIndexProvider;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.index.impl.lucene.LuceneBatchInserterIndexProvider;
import org.neo4j.kernel.impl.batchinsert.BatchInserter;
import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl;

public class CreateAndConnectNodes {

        public static void main(String[] args) throws IOException {
                BufferedReader bf = new BufferedReader(new FileReader(
                                "/media/sdg1/Wikipedia/Reduced 
Files/autoren-der-wikiartikel"));
                BufferedReader bf2 = new BufferedReader(new FileReader(
                                "/media/sdg1/Wikipedia/Reduced 
Files/wikipedia-output"));
                CreateAndConnectNodes cacn = new CreateAndConnectNodes();
                cacn.createGraphDatabase(bf, bf2);

        }

        private long relationCounter = 0;

        private void createGraphDatabase(BufferedReader bf, BufferedReader bf2)
                        throws IOException {
                BatchInserter inserter = new BatchInserterImpl(
                                "target/socialNetwork-batchinsert");
                BatchInserterIndexProvider indexProvider = new
LuceneBatchInserterIndexProvider(
                                inserter);
                BatchInserterIndex authors = indexProvider.nodeIndex("author",
                                MapUtil.stringMap("type", "exact"));
                authors.setCacheCapacity("name", 100000);

                String zeile;
                String zeile2;

                while ((zeile = bf.readLine()) != null) {
                        Map&lt;String, Object&gt; properties = 
MapUtil.map("name", zeile);
                        long node = inserter.createNode(properties);
                        authors.add(node, properties);
                }
                bf.close();
                System.out.println("Nodes created!");
                authors.flush();
                String node = "";
                long node1 = 0;
                long node2 = 0;
                while ((zeile2 = bf2.readLine()) != null) {
                        if (relationCounter++ % 100000000 == 0) {

                                System.out
                                                .println("Edges already 
created: " + relationCounter);

                        }
                        String[] relation = zeile2.split("%;% ");
                        if (node == "") {
                                node = relation[0];
                                if (authors.get("name", 
relation[0]).getSingle() != null) {
                                        node1 = authors.get("name", 
relation[0]).getSingle();
                                } else {
                                        System.out.println("Autor 1: " + 
relation[0]);
                                        break;
                                }

                        }
                        if (!node.equals(relation[0])) {
                                node = relation[0];
                                if (authors.get("name", 
relation[0]).getSingle() != null) {
                                        node1 = authors.get("name", 
relation[0]).getSingle();
                                } else {
                                        System.out.println("Autor 1: " + 
relation[0]);
                                        break;
                                }

                        }
                        if (authors.get("name", relation[1]).getSingle() != 
null) {
                                node2 = authors.get("name", 
relation[1]).getSingle();
                        } else {
                                System.out.println("Autor 2: " + relation[1]);
                                break;
                        }

                        Map&lt;String, Object&gt; properties = 
MapUtil.map("timestamp",
                                        Long.parseLong(relation[2].trim()));
                        inserter.createRelationship(node1, node2,
                                        
DynamicRelationshipType.withName("KNOWS"), properties);

                }
                System.out.println("Edges created!!!");
                bf2.close();
                indexProvider.shutdown();
                inserter.shutdown();
        }
}




I want to know if there is any better way to create such a big database or
am I doing it correctly?

Can I maybe optimize the import for traversals I want to do or is this the
standard sort of import?

The Java heapsize for the insert here was -Xmx8G.





After I had created the graph database I wanted to get the node degree of
every node.

To get the node degree I created the following code:



package wikiOSN;

import java.io.IOException;
import java.util.Date;
import java.util.Iterator;
import java.util.Map;

import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Relationship;
import org.neo4j.kernel.EmbeddedGraphDatabase;

public class NodeDegree {

        public static void main(String[] args) throws IOException {
                NodeDegree nd = new NodeDegree();
                nd.getNodeDegree();
                System.out.println("NodeDegree calculated!!!");
                Date date = new Date();
                date.setTime(System.currentTimeMillis());
                System.out.println(date);

        }

        private GraphDatabaseService db;
        private int counter;

        private void getNodeDegree() throws IOException {
                
                db = new 
EmbeddedGraphDatabase("target/socialNetwork-batchinsert");

                for (Node node : db.getAllNodes()) {
                        counter = 0;

                        if (node.getId() > 0) {
                                for (Relationship rel : 
node.getRelationships()) {
                                        counter++;
                                }
                                
System.out.println(node.getProperty("name").toString() + ": "
                                                + counter);
                        }

                }

                db.shutdown();

        }

}



The problem here is, that after 3 days I only got the node degree for 80000
nodes.

That is a huge amount of time for only 80000 nodes. What am I doing wrong
here?

I also tried to tune my traversal, but it is still very slow.

How can I optimize that, so that I get the node degree only in one day for
3.5M nodes?



Do I have to change something at the import of the data or is there a better
way for getting the node degree?



Thank you very much for your help!



Greetings,

Stephan

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351599.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

Reply via email to