Hi Ville,

We ran into a similar problem basically wanting to search only part of the 
graph using Lucene. We used traversing to determine the nodes to search from 
and from there on use Lucene to do a search on nodes connected to the nodes 
from the traverse result.

We solved it as follows:
- defined a TransactionEventHandler to auto-update the indexes with node 
properties, but also add relationships to the same index. We use the 
relationship.name() as the property name for Lucene, with the 'other node' id 
as the value.
- traverse to get a set of nodes from where on the search. We apply the ACL 
here to only return nodes the user is allowed to see.
- create a BooleanQuery for Lucene with the relationship.name() field names and 
id's. So if the relationship would be 'IS_FRIEND_OF' and we want to do a full 
text search for 'trinity' on friends of people with ids 1,2 and 3, we create a 
query that contains: +(name:trinity) +(isfriendof:1 isfriendof:2 isfriendof:3)

To make sure we only get back 'person' nodes we also indexed the node type (in 
our case 'emtype'), so the complete query is:
+emtype:person +name:trinity +(isfriendof:1 isfriendof:2 isfriendof:3)

This way you can easily traverse to define the 'edges' of where to search and 
let Lucene handle the search within that region.

Optionally we add the ACL to the Lucene query as well using the same technique, 
basically adding all group ids the current user is member of and has a 
'CAN_ACCESS' relationship with the node:
+emtype:person +name:trinity +(isfriendof:1 isfriendof:2 isfriendof:3) 
+(canaccess:233 canaccess:254 canaccess:324)

It works for us because in our case we know the traversal will return a 
reasonable set of nodes (not thousands+). Lucene can return thousands of nodes, 
but that's not a problem of course. And we can still use the fun stuff like 
sorting, paging and score results.

Hope this helps.

Cheers

Paul


PS: we always use lower case field names without underscores because somehow it 
makes Lucene happier


On 18 apr 2011, at 11:19, Mattias Persson wrote:

> 2011/4/18 Michael Hunger <michael.hun...@neotechnology.com>:
>> Would it be also possible to go the other way round?
>> 
>> E.g. have the index-results (name:Vil*) as starting point and traverse 
>> backwards the two steps to your start node? (Either using a traversal or the 
>> shortest path graph algo with a maximum way-length)?
> 
> That's what I suggested, but it doesn't exist yet :)
> 
> To do it that way today (do a traversal from each and every index
> result) would probably be slower than doing one traversal with
> filtering.
> 
>> 
>> Cheers
>> 
>> Michael
>> 
>> Am 18.04.2011 um 11:03 schrieb Mattias Persson:
>> 
>>> Hi Ville,
>>> 
>>> 2011/4/14 Ville Mattila <vi...@mattila.fi>:
>>>> Hi there,
>>>> 
>>>> I am somehow stuck with a problem of combining traversing and queries
>>>> to indices efficiently - something like finding all people with a name
>>>> starting with "Vil*" two steps away from a reference node.
>>>> 
>>>> Traversing all friends within two steps from the reference node is
>>>> trivial, but I find it a bit inefficient to apply a return evaluator
>>>> in each of the nodes visited during traversal. Or is it so? How about
>>>> more complex criteria which may involve more than one property or even
>>>> more complex (Lucene) queries?
>>>> 
>>> 
>>> The best solution IMHO (one that isn't available yet) would be to let
>>> a traversal have multiple starting points, that is have the index
>>> result as starting point.
>>> 
>>> I think that doing a traversal and filtering with an evaluator is the
>>> way to go. Have you tried doing this and saw a bad performance for it?
>>> 
>>>> I was thinking to spice up my Neo4j setup with Elasticsearch
>>>> (www.elasticsearch.org) to dedicate Neo4j to keep track of the
>>>> relationships and ES to index all the data in them, however it makes
>>>> me feel very uncomfortable to keep up the consistency when data gets
>>>> updated. However, now I need to keep also Neo4j indices updated. And
>>>> not to be said, combining traversal and an external index is yet more
>>>> complicated. However I like the idea I don't need to index each
>>>> property separately (as it seems to be with Neo4j indices now).
>>>> 
>>> 
>>> Well, that's the way things are a.t.m... you have to index/keep index
>>> synced yourself. There are plans to make indexing more automatic and
>>> painless, though.
>>> 
>>> So, do you still need indexing even if you try the solution with
>>> evaluator instead of doing index lookup w/ wildcard?
>>> 
>>>> Just to clarify, I use REST API with Neo4j.
>>>> 
>>>> Maybe I am completely lost and somehow fixed to only one viewpoint in
>>>> this whole case... So, any comments, they are appreciated. =)
>>> 
>>> As always you can create plugins which wraps your logic and keeps your
>>> indexes in sync when changes happen, so instead of going through the
>>> generic rest API, you expose your own API.
>>> 
>>>> 
>>>> Thanks,
>>>> Ville
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> User@lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Mattias Persson, [matt...@neotechnology.com]
>>> Hacker, Neo Technology
>>> www.neotechnology.com
>>> _______________________________________________
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>> 
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> 
> 
> 
> -- 
> Mattias Persson, [matt...@neotechnology.com]
> Hacker, Neo Technology
> www.neotechnology.com
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to