[Neo] Test data from the filesystem / jdk

Michael Hunger Sun, 28 Sep 2008 10:32:17 -0700

I continued thinking about test data. I've been thinking for a while about 
better tools for navigating sourcecode. 
Especially big projects. If they're even designed well they will have tons of 
classes (IntelliJ IDEA has 30 000). The 
problem is that most developers are not that able with the navigation 
mechanisms of their IDEs.
Often they don't come from a top-down perspective from where the OO (or even 
composite) structures are clearly visible 
and separation of the concerns explainable. More often the way from bottom-up 
or the sideline will leave them totally
lost.


So I've been thinking about a tool that combines fast searching with google 
earth like display/zooming techniques and 
full text retrieval as well as structural search (think freebase parallax) over 
the whole AST of big (java) projects.

So when I learned more about neo from Emil last weekend I looked first for 
testdata onto the filesystem. There I have
plenty of nodes (1 mio files) even forming a graph (count symbolic links) with 
lots of semi-structured information.
So the first approach was slurping filesystem into neo. I had some problems 
with that as building the graph was quite 
slow. I'll look into that later (perhaps parallelizing the traversal helps).

Two days ago I was struck with the idea of combining neo with the code 
navigation project I have in my mind. So using 
asm's visitor for analysis (you don't need to load the classes into (perm-gen) 
memory like with reflection) it was easy 
to load the jdk's (classes.jar) 20000 classes and their method and fields into 
memory objects (about 20mb of data).

Then I added a second visitor which is able to create neo node based objects 
from that. At first it was awfully slow.
Taking from 8 seconds for the first 500 classes rising up to 300 seconds for 
the last 500 clases. Then I added more 
caching to it and removed some of the neo lookups (which are fast with 2ms but 
way to slow for constructing the graph).
(I also committed after each 500 classes and gave the jvm 1gb of memory both 
not needed with the in memory approach)

At the end I dropped all traversal lookup stuff for finding the nodes to add to 
but just cached all of them. So that's
the point I'd like to discuss:
Whats the best way of building up a reasonably large database?

What I stuck with is caching everything by name/identifier in java maps and not 
looking up nodes (by traversal) for 
building relationships. This can get a bit problematic with concurrent threads 
(say I want to use many threads for
reading the file system or parsing java class files to increase throughput).

Another thing I noted was that creating nodes and relationships in neo is not 
as fast as it should.

And even another one - I tried using a ramdisk (on a Mac) and it even slowed 
things down (compared to my solid state drive).

Michael
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

[Neo] Test data from the filesystem / jdk

Reply via email to