[Neo4j] Neo4j index performance using Lucene

Erik Fäßler Thu, 28 Jul 2011 23:38:55 -0700

Hi all,

I've been doing preliminary evaluations on some Neo4j operations. One of which 
rises from a specific need in my application:
My method will get a List of node ids (stored in the nodes' properties) and 
need to retrieve exactly these nodes from the GraphDB. This should happen as 
fast as possible, of course. I used an index for the ids. My code is as follows:


private static final int SAMPLE_SIZE = 100000;

...

GraphDatabaseService graphDb = new EmbeddedGraphDatabase("tmp/graphdb");
                                        
                Transaction t = graphDb.beginTx();
                IndexManager im = graphDb.index();
                Index<Node> ni = im.forNodes("nodes");
                ( (LuceneIndex<Node>) ni ).setCacheCapacity( "nodes", 500000 );
                for (int i = 0; i < SAMPLE_SIZE; ++i) {
                        Node n = graphDb.createNode();
                        n.setProperty("id", i);
                        ni.add(n, "id", n.getProperty("id"));
                }
                t.success();
                t.finish();
                
                long time = System.currentTimeMillis();
                for (int i = 0; i < SAMPLE_SIZE; ++i) {
                        Node n = ni.get("id", i).getSingle();
                }
                System.out.println(System.currentTimeMillis() - time);


It works, but is rather slow. If I do the last loop a second time, the Lucene 
cache kicks in and reduces the required time by half. But then it's still some 
time (2000ms on my machine).
When I do the exact same thing with a HashMap for example, the same loop (with 
call Node n = ni.get("id", i).getSingle();) takes about 10ms.

I now HashMaps have other drawbacks such like memory consumption. For my use 
case this wouldn't be the problem, however, as I would only have to cache about 
1M nodes which is perfectly possible in a HashMap. My main question is: Have I 
done something wrong in my usage of the Lucene index? Can it be sped up 
somehow? Or will I always be served better performance wise using a HashMap for 
such cases where I have a large amount of single queries?

Thank you and best regards,

        Erik


_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo4j] Neo4j index performance using Lucene

Reply via email to