I continued thinking about test data. I've been thinking for a while about better tools for navigating sourcecode. Especially big projects. If they're even designed well they will have tons of classes (IntelliJ IDEA has 30 000). The problem is that most developers are not that able with the navigation mechanisms of their IDEs. Often they don't come from a top-down perspective from where the OO (or even composite) structures are clearly visible and separation of the concerns explainable. More often the way from bottom-up or the sideline will leave them totally lost.
So I've been thinking about a tool that combines fast searching with google earth like display/zooming techniques and full text retrieval as well as structural search (think freebase parallax) over the whole AST of big (java) projects. So when I learned more about neo from Emil last weekend I looked first for testdata onto the filesystem. There I have plenty of nodes (1 mio files) even forming a graph (count symbolic links) with lots of semi-structured information. So the first approach was slurping filesystem into neo. I had some problems with that as building the graph was quite slow. I'll look into that later (perhaps parallelizing the traversal helps). Two days ago I was struck with the idea of combining neo with the code navigation project I have in my mind. So using asm's visitor for analysis (you don't need to load the classes into (perm-gen) memory like with reflection) it was easy to load the jdk's (classes.jar) 20000 classes and their method and fields into memory objects (about 20mb of data). Then I added a second visitor which is able to create neo node based objects from that. At first it was awfully slow. Taking from 8 seconds for the first 500 classes rising up to 300 seconds for the last 500 clases. Then I added more caching to it and removed some of the neo lookups (which are fast with 2ms but way to slow for constructing the graph). (I also committed after each 500 classes and gave the jvm 1gb of memory both not needed with the in memory approach) At the end I dropped all traversal lookup stuff for finding the nodes to add to but just cached all of them. So that's the point I'd like to discuss: Whats the best way of building up a reasonably large database? What I stuck with is caching everything by name/identifier in java maps and not looking up nodes (by traversal) for building relationships. This can get a bit problematic with concurrent threads (say I want to use many threads for reading the file system or parsing java class files to increase throughput). Another thing I noted was that creating nodes and relationships in neo is not as fast as it should. And even another one - I tried using a ramdisk (on a Mac) and it even slowed things down (compared to my solid state drive). Michael _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user