A typical traversal while teaching is like this: - grab 1 text node and all it's token nodes (can be cached in memory but still need initial DB lookup) - do calculations - update token nodes. There are usually 50-500 token nodes for a text node so 1 iteration in teaching needs to access this many nodes. I thought to cache all required properties of each node in memory because REST communication is slow. However, if I go for the embedded version, I can use on-the-fly lookup since neo will do the caching for me. Using the embedded version solves nearly all my troubles. I just need to find a good way to provide concurrent access to the graph. I like the idea of stateful usage, where can I find more information about it?
Thanks Miklós 2011.02.28. 11:47 keltezéssel, Michael Hunger írta: > Miklós, > > perhaps there is some categorization you can do to the text nodes and > introduce intermediary nodes from the root node. (Or at least shard them by > some number). > > Then you could also do traversals from those intermediary nodes in parallel. > If that helps you too speed up. > > What are you querying for in your traversals? You can put information on the > relationships and evaluate that (and also all other nodes and relationships > so far on the current path) using the evaluators (new framework) or > pruning/filtering (stable framework). > > Can you explain a typical traversal so that it gets more clear what kind of > queries you run against the graph. > > Yes I'd really recommend developing against embedded and then exposing this > custom, higher level API to the clients (as you obviously have multiple > clients accessing the data). > Your custom REST API can also do paging or streaming (or if you go for a > stateful (e.g. session based) one then you can also keep the traversal state > in the session and continuously pull from that). > > HTH > > Michael > > Am 28.02.2011 um 11:36 schrieb Kiss Miklós: > >> Thanks for the ideas! >> >> OK, I'll tell a bit more about my current scenario. I use the graph to >> store medium sized texts (various sizes, 1 - 30kb) with some properties, >> their tokens (1-2-3 word phrases) with many properties, and some other >> stuff too which is not relevant now. >> Since I use the graph for teaching AI algoritms, I need to query (and >> modify) large parts of the graph from time to time and I cannot do any >> pruning here (at least I didn't find a way to do that). The >> relationships I use are mainly the same type and I cannot really >> distinguish them in the domain. The only thing I can think of is to >> introduce different types of relationships based on word count for >> tokens (e.g.: TOKEN_1, TOKEN_2, TOKEN_3 for 1-2-3 length phrases). >> Otherwise they are considered the same. And all the text nodes are >> connected to the reference node (I have *many* text nodes). The most >> problematic part is to query (a part of) the text nodes. >> Currently I access the DB through the REST API mainly because I need to >> have concurrent access and it seemed a good solution at first when I >> started the project. I use travesing to query the nodes/relationships I >> need and I also use the indexing service for various tasks (duplicate >> check, node lookup). Since the relationships I use have the same type, I >> cannot write efficient prune evaluators. It would be good if I could >> query the results in more groups each returning only ~100 results but I >> have no ways to tell the traverser where to start and where to stop (the >> graph itself doesn't have such information). >> >> If I understand well Your advice I should implement complex server >> plugins and call them using REST? So actually I put parts of the program >> logic to the DB server? This will have better performance but won't it >> make development harder (program logic is distibuted on different levels)? >> >> Yes, unfortunately I need concurrent write access. However, sharing the >> GraphDatabaseService object might be usable for me. I'll investigate >> this approach too. >> >> Thanks again, >> Miklós Kiss >> >> 2011.02.28. 10:38 keltezéssel, Michael Hunger írta: >>> Miklós, >>> >>> you should actually do both :) >>> >>> So go for the embedded version to create domain specific service calls for >>> interacting with your database. >>> >>> And then expose those as your own REST-endpoints using either a >>> Server-Plugin or an unmanaged extension. >>> >>> The current REST server API is too noisy to transfer that many >>> relationships per node. >>> >>> How do you currently use the REST server API ? Which calls? >>> If you don't already do traversals then you should definitely look into >>> them. (Rather than traversing node relationships one by one). >>> With the traversal you can already specify javascript code that is executed >>> in the server context for pruning and filtering. >>> >>> Do you need concurrent write access? Otherwise the other stores can work in >>> readonly mode against the database. >>> And concurrent access from inside the same VM is no problem, you can easily >>> share the GraphDatabaseService object. >>> >>> Perhaps you can also tell us a bit about your domain and how it is modelled >>> to support you there. >>> >>> Cheers >>> >>> Michael >>> >>> Am 28.02.2011 um 10:19 schrieb Kiss Miklós: >>> >>>> Hi all, >>>> >>>> I'm wondering if I'm using the Neo4j graph database right. My current >>>> graph structure contains many relations for every single node. Some of >>>> the nodes have>10000 relations which is hard to traverse using REST >>>> server (collecting nodes is heavy on memory and transmitting is heavy on >>>> bandwith). >>>> >>>> My first question is: is this structure a useable one or should I >>>> restructure my graph so that the number of direct relationships becomes >>>> much lower (don't really know how could I do that and it would obfuscate >>>> the domain model)? >>>> >>>> My second question is: if I leave the structure that way, I could solve >>>> the performance issues if I used the embedded version of neo. However, I >>>> need to have concurrent access to the graph. Is this possible? >>>> >>>> Thanks, >>>> Miklós Kiss >>>> _______________________________________________ >>>> Neo4j mailing list >>>> User@lists.neo4j.org >>>> https://lists.neo4j.org/mailman/listinfo/user >>> _______________________________________________ >>> Neo4j mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >>> >>> >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user