Re: [Neo4j] Concurrent graph usage, design guidelines

Kiss Miklós Mon, 28 Feb 2011 03:28:25 -0800

A typical traversal while teaching is like this:
- grab 1 text node and all it's token nodes (can be cached in memory but 
still need initial DB lookup)
- do calculations
- update token nodes.
There are usually 50-500 token nodes for a text node so 1 iteration in 
teaching needs to access this many nodes. I thought to cache all 
required properties of each node in memory because REST communication is 
slow. However, if I go for the embedded version, I can use on-the-fly 
lookup since neo will do the caching for me.
Using the embedded version solves nearly all my troubles. I just need to 
find a good way to provide concurrent access to the graph. I like the 
idea of stateful usage, where can I find more information about it?


Thanks
Miklós

2011.02.28. 11:47 keltezéssel, Michael Hunger írta:
> Miklós,
>
> perhaps there is some categorization you can do to the text nodes and 
> introduce intermediary nodes from the root node. (Or at least shard them by 
> some number).
>
> Then you could also do traversals from those intermediary nodes in parallel. 
> If that helps you too speed up.
>
> What are you querying for in your traversals? You can put information on the 
> relationships and evaluate that (and also all other nodes and relationships 
> so far on the current path) using the evaluators (new framework) or 
> pruning/filtering (stable framework).
>
> Can you explain a typical traversal so that it gets more clear what kind of 
> queries you run against the graph.
>
> Yes I'd really recommend developing against embedded and then exposing this 
> custom, higher level API to the clients (as you obviously have multiple 
> clients accessing the data).
> Your custom REST API can also do paging or streaming (or if you go for a 
> stateful (e.g. session based) one then you can also keep the traversal state 
> in the session and continuously pull from that).
>
> HTH
>
> Michael
>
> Am 28.02.2011 um 11:36 schrieb Kiss Miklós:
>
>> Thanks for the ideas!
>>
>> OK, I'll tell a bit more about my current scenario. I use the graph to
>> store medium sized texts (various sizes, 1 - 30kb) with some properties,
>> their tokens (1-2-3 word phrases) with many properties, and some other
>> stuff too which is not relevant now.
>> Since I use the graph for teaching AI algoritms, I need to query (and
>> modify) large parts of the graph from time to time and I cannot do any
>> pruning here (at least I didn't find a way to do that). The
>> relationships I use are mainly the same type and I cannot really
>> distinguish them in the domain. The only thing I can think of is to
>> introduce different types of relationships based on word count for
>> tokens (e.g.: TOKEN_1, TOKEN_2, TOKEN_3 for 1-2-3 length phrases).
>> Otherwise they are considered the same. And all the text nodes are
>> connected to the reference node (I have *many* text nodes). The most
>> problematic part is to query (a part of) the text nodes.
>> Currently I access the DB through the REST API mainly because I need to
>> have concurrent access and it seemed a good solution at first when I
>> started the project. I use travesing to query the nodes/relationships I
>> need and I also use the indexing service for various tasks (duplicate
>> check, node lookup). Since the relationships I use have the same type, I
>> cannot write efficient prune evaluators. It would be good if I could
>> query the results in more groups each returning only ~100 results but I
>> have no ways to tell the traverser where to start and where to stop (the
>> graph itself doesn't have such information).
>>
>> If I understand well Your advice I should implement complex server
>> plugins and call them using REST? So actually I put parts of the program
>> logic to the DB server? This will have better performance but won't it
>> make development harder (program logic is distibuted on different levels)?
>>
>> Yes, unfortunately I need concurrent write access. However, sharing the
>> GraphDatabaseService object might be usable for me. I'll investigate
>> this approach too.
>>
>> Thanks again,
>> Miklós Kiss
>>
>> 2011.02.28. 10:38 keltezéssel, Michael Hunger írta:
>>> Miklós,
>>>
>>> you should actually do both :)
>>>
>>> So go for the embedded version to create domain specific service calls for 
>>> interacting with your database.
>>>
>>> And then expose those as your own REST-endpoints using either a 
>>> Server-Plugin or an unmanaged extension.
>>>
>>> The current REST server API is too noisy to transfer that many 
>>> relationships per node.
>>>
>>> How do you currently use the REST server API ? Which calls?
>>> If you don't already do traversals then you should definitely look into 
>>> them. (Rather than traversing node relationships one by one).
>>> With the traversal you can already specify javascript code that is executed 
>>> in the server context for pruning and filtering.
>>>
>>> Do you need concurrent write access? Otherwise the other stores can work in 
>>> readonly mode against the database.
>>> And concurrent access from inside the same VM is no problem, you can easily 
>>> share the GraphDatabaseService object.
>>>
>>> Perhaps you can also tell us a bit about your domain and how it is modelled 
>>> to support you there.
>>>
>>> Cheers
>>>
>>> Michael
>>>
>>> Am 28.02.2011 um 10:19 schrieb Kiss Miklós:
>>>
>>>> Hi all,
>>>>
>>>> I'm wondering if I'm using the Neo4j graph database right. My current
>>>> graph structure contains many relations for every single node. Some of
>>>> the nodes have>10000 relations which is hard to traverse using REST
>>>> server (collecting nodes is heavy on memory and transmitting is heavy on
>>>> bandwith).
>>>>
>>>> My first question is: is this structure a useable one or should I
>>>> restructure my graph so that the number of direct relationships becomes
>>>> much lower (don't really know how could I do that and it would obfuscate
>>>> the domain model)?
>>>>
>>>> My second question is: if I leave the structure that way, I could solve
>>>> the performance issues if I used the embedded version of neo. However, I
>>>> need to have concurrent access to the graph. Is this possible?
>>>>
>>>> Thanks,
>>>> Miklós Kiss
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> User@lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>> _______________________________________________
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>>>
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
>

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Concurrent graph usage, design guidelines

Reply via email to