Re: [Neo] basic questions

Lincoln Wed, 17 Mar 2010 16:39:38 -0700

I really like that syntax, however I'm concerned that although the intent is
expressed concisely, behind the scenes it's still going to have to traverse
the entire relationship graph.  The guards in the for comprehension will
translate to filter() calls but it still has to examine everything in the
Iterable, right?


I've seen some very cool scala wrappers for SQL and MongoDB that
re-implement map(), filter(), and flatMap() so that you can use for
comprehensions even though it's doing custom work behind the scenes.  It
would be really cool to do that for Neo4j and have it generate efficient
traversals instead of doing naive iteration.

Thanks,
Lincoln

On Wed, Mar 17, 2010 at 6:32 PM, Niels Hoogeveen
<pd_aficion...@hotmail.com>wrote:

>
> The original questioner works in Scala where the pipe concept can easily be
> expressed using for comprehensions.
>
> val node = index.getSingleNode("id")
> for(
>  i <- node.getRelationships(FOLLOWS, OUTGOING) //binds the iteration over
> the relations to i
>  if(i.getProperty("follow_code") != "not important"); // filters the
> properties of the relations bound to i
>  j <- i.getEndNode; // binds the end nodes of the relations to j
>  if(j.getProperty("name") == "something"); // filters the properties of the
> node bound to j
>  k <- j.getRelationships(RefersTo, OUTGOING) //binds the iteration over the
> relations of j
>  if(k.getProperty("CREATED") <= "2010-03-17"); // filters the properties of
> the relation bound to k
> )yield{
>  println(k.getProperty("name"))
> }
>
> Kind regards,
> Niels Hoogeveen
>
> > From: okramma...@gmail.com
> > Date: Wed, 17 Mar 2010 14:37:32 -0700
> > To: user@lists.neo4j.org
> > Subject: Re: [Neo] basic questions
> >
> > Hey,
> >
> > You might want to consider Blueprints Pipes for a more controlled
> traverser framework that doesn't require the use of for-loops and allows you
> to specify arbitrary paths through a graph.
> >
> > http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework
> >
> > For the example viewer-->FOLLOWS-->user-->CREATED-->message do,
> >
> > //////////////////////////////////////////
> > Pipe<Vertex,Edge> pipe1 = new
> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
> > Pipe<Edge,Edge> pipe2 = new LabelFilterPipe(Arrays.asList("FOLLOWS"),
> false);
> > Pipe<Edge,Vertex> pipe3 = new
> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
> > Pipe<Vertex,Edge> pipe4 = new
> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
> > Pipe<Edge,Edge> pipe5 = new LabelFilterPipe(Arrays.asList("CREATED"),
> false);
> > Pipe<Edge,Vertex> pipe6 = new
> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
> > Pipeline<Vertex,Vertex> pipeline = new
> Pipeline<Vertex,Vertex>(Arrays.asList(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6));
> >
> > Neo4jGraph graph = new Neo4jGraph('/dir/neo')
> > graph.startTransaction();
> > pipeline.setStarts(Arrays.asList(viewer).iterator());
> > while(pipeline.hasNext()) {
> >       System.out.println("message: " + pipeline.next());
> > }
> > graph.stopTransaction(true);
> > //////////////////////////////////////////
> >
> > NOTE: I hand typed this from memory so there might be some errors here or
> there....
> >
> > A Pipe/Pipeline implements Iterator so you can just page out as many
> items as you want that can legally flow through pipeline...
> >
> > If this is interesting to you and if you use MVN and Git, you may want to
> build the latest and greatest of Blueprints [
> http://blueprints.tinkerpop.com ] as I continually add new Pipes to do new
> things [ see
> http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework#pipes_api].
> >
> > Also, a non-iterator based mechanism is provided by Gremlin [
> http://gremlin.tinkerpop.com ] which would express the same thing as:
> >
> > $messages := ./ou...@label='FOLLOWS']/inV/ou...@label='CREATED']/inV
> >
> > Take care,
> > Marko.
> >
> > http://markorodriguez.com
> > http://gremlin.tinkerpop.com
> >
> >
> > On Mar 17, 2010, at 2:19 PM, Lincoln wrote:
> >
> > > Wow dude, this is blowing my mind just a little.
> > >
> > > Ok, sticking with the twitter example, I'm concerned about the edge
> cases.
> > > I'd say it's easy to optimize with a relational db or any other storage
> for
> > > that matter if I make the assumption that people only follow a few
> hundred
> > > people and only want recent messages.  However some people follow
> hundreds
> > > of thousands of people.  If Guy Kawasaki uses my app, I'd run into a
> problem
> > > quickly.
> > >
> > > However I see your point that I don't have to limit myself to just the
> > > obvious relationships, but can create relationships that serve specific
> > > purposes and use-cases such as your day example.  I'm not sure how I
> would
> > > want to model my use-case to allow for Guy Kawaski, I'll have to think
> more
> > > about it.  Is there a threshold beyond which adding relationships
> between
> > > nodes causes problems?  If not, or if it's high, you could create
> custom
> > > relationships for every type of query you'd want to do.
> > >
> > > However, a secondary question comes up.  If we continue with the
> twitter
> > > example, and I want to be able to page through results, is that
> directly
> > > supported through Neo4j's API?  Coming from a more traditional storage
> > > background I tend to think of what I'd want as a sort by time and then
> a
> > > skip and limit on the results (so I could say give me messages 1-100
> sorted
> > > by time descending).  Is there anything equivalent in Neo4j or is the
> > > approach totally different?
> > >
> > > Thanks,
> > > Lincoln
> > >
> > >
> > > On Wed, Mar 17, 2010 at 12:41 PM, Craig Taverner <cr...@amanzi.com>
> wrote:
> > >
> > >> Hi Lincoln,
> > >>
> > >> So it sounds like you don't need the IS_VISIBLE relations after all.
> The
> > >> traverser works by following all relationships of the specified types
> and
> > >> directions from each current node (as you traverse, or walk the
> graph). You
> > >> can have a complex graph and traverse to high depth very fast
> (thousands of
> > >> relationships per second). The traverser will also automatically check
> that
> > >> the same node is not returned twice. The test for the relationship
> type is
> > >> efficient. Still reasonable, but less efficient is the custom test you
> > >> might
> > >> put in the returnable evaluator, but if the limiting factor is usually
> the
> > >> number of relationships traversed, and if that is kept managable, the
> > >> evaluator test is no concern.
> > >>
> > >> I think twitter is a good case in point, even with many millions of
> users,
> > >> you will still only follow perhaps a hundred and they will tweet
> perhaps a
> > >> hundred, or a thousand times, so your traverser will find the 10k-100k
> > >> messages quite quickly. This can be speeded up further, but the right
> > >> approach depends again on your use case. The idea with using a graph
> > >> database is that the actual usage probably maps very well to the graph
> > >> structure, so when deciding how to speed up your search, consider how
> it
> > >> will be used. In twitter one normally only cares about recent
> messages, so
> > >> how about not linking directly from the user to the message, but link
> to an
> > >> intermediate node representing time, for example, a day-node. Then
> each new
> > >> message is added to the day node for that day, and that will
> automatically
> > >> become yesterday the next day. Then your traversal can have a stop
> > >> evaluator
> > >> to not follow old messages (unless your query is looking for old
> messages,
> > >> of course). So the 100k messages might drop to only a few hundred, or
> even
> > >> just a few dozen. Certainly that will be a query of the order of
> > >> milliseconds!
> > >>
> > >> Moving away from the traverser, you also have the option to call
> directly
> > >> the getRelationships() methods from the node. If you structure is
> > >> predictable, like viewer-->FOLLOWS-->user-->CREATED-->message, then
> two
> > >> nested for loops would work, the outer iterating over the followers
> and the
> > >> inner iterating over the messages. If you changed to add a time-based
> > >> interim node (which is a kind of graph-index), then you need to have
> three
> > >> loops. If you made your time index a deeper tree (months->days->hours,
> > >> etc.), then you would need to further refactor the code. However, if
> you
> > >> stuck with a traverser, you might not need to change the traverser
> even of
> > >> the graph structure changed, as long as the same relationship types
> were
> > >> maintained. Does that make sense?
> > >>
> > >> Cheers, Craig
> > >>
> > >> On Wed, Mar 17, 2010 at 4:00 PM, Lincoln <linxbet...@gmail.com>
> wrote:
> > >>
> > >>> Thanks Craig,
> > >>>
> > >>> I'd like to clarify my question (I don't think it changes your answer
> > >>> though).
> > >>>
> > >>> I wanted all messages visible to me created by users I follow.  Thus,
> the
> > >>> FOLLOWS relationship is not enough.  I'd need to see messages that
> are
> > >>> visible to me and then check if they were created by users I follow,
> or
> > >> I'd
> > >>> need to see messages created by users I follow and then see if
> they're
> > >>> visible to me.
> > >>>
> > >>> I assume your last example still yields the result I'm looking for.
> > >> Could
> > >>> you describe what actually happens here though?  I'm unclear on what
> the
> > >>> traversal looks like.  Would it first traverse every outgoing FOLLOWS
> > >>> relationship from the viewer, yielding other users, and then traverse
> all
> > >>> the CREATED relationships to get to messages?
> > >>>
> > >>> Also, given very large numbers of FOLLOWS and CREATED relationships
> (with
> > >>> say, a twitter graph), how is this made efficient?
> > >>>
> > >>> Sorry for all the basic questions but I couldn't find this
> information in
> > >>> the docs.  If there's something I should be reading before posting
> these
> > >>> questions, please point me to it.
> > >>>
> > >>> Thanks!
> > >>>
> > >>> Lincoln
> > >>>
> > >>> On Wed, Mar 17, 2010 at 7:06 AM, Craig Taverner <cr...@amanzi.com>
> > >> wrote:
> > >>>
> > >>>> I'm uncertain about one ambiguity in your model, you are able to
> find
> > >>>> messages through FOLLOWS and IS_VISIBLE_BY. These will give two
> > >> different
> > >>>> sets, and my first impression was that FOLLOWS gives you the right
> > >>> answer.
> > >>>> In other words you want to query for 'all messages by users I
> follow'?
> > >> In
> > >>>> that case you do not need IS_VISIBLE_BY. However, if there are
> messages
> > >>> by
> > >>>> people you follow, but are not allowed to see, then you also need
> the
> > >>>> IS_VISIBLE_BY. But I would still reconsider linking directly from
> the
> > >>>> viewer
> > >>>> to the message for that case. I'd rather have the messages linked to
> > >> some
> > >>>> categorization structure for things like 'public', 'private', etc.
> > >>>>
> > >>>> Anyway, here are some suggestions for the various approaches above:
> > >>>> *'all messages by users I follow'*
> > >>>> val msgs = viewer.traverse(
> > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> > >>>> (tp: TraversalPosition) => IsMessage(tp.currentNode()),
> > >>>> Rels.FOLLOWS, Direction.OUTGOING,
> > >>>> Rels.CREATED, Direction.OUTGOING)
> > >>>>
> > >>>> *'all messages visible to me'*
> > >>>> val msgs = viewer.traverse(
> > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> > >>>> ReturnableEvaluator.ALL_BUT_START_NODE,
> > >>>> Rels.IS_VISIBLE_BY, Direction.INCOMING)
> > >>>>
> > >>>> *'all messages, visible to me, by people I follow'*
> > >>>> val msgs = viewer.traverse(
> > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> > >>>> (tp: TraversalPosition) => {
> > >>>> val msg = tp.currentNode()
> > >>>> IsMessage(msg) && IsVisibleBy(msg,viewer)
> > >>>> },
> > >>>> Rels.FOLLOWS, Direction.OUTGOING,
> > >>>> Rels.CREATED, Direction.OUTGOING)
> > >>>>
> > >>>> Of course I assume you make the utility functions IsMessage(node:
> Node)
> > >>> and
> > >>>> IsVisibleBy(msg: Node, user: Node), and these will test the
> existance
> > >> of
> > >>>> properties and relations as appropriate to make the decision.
> > >>>>
> > >>>>
> > >>>> On Wed, Mar 17, 2010 at 6:32 AM, Lincoln <linxbet...@gmail.com>
> wrote:
> > >>>>
> > >>>>> Hi, I've just started looking at Neo4j and I'm quite intrigued.
> > >>> However,
> > >>>>> the cognitive dissonance that I've grown so used to in modeling
> > >> storage
> > >>>> is
> > >>>>> proving to be a bit difficult to let go at this early stage :)
> > >>>>>
> > >>>>> I was hoping that if someone could help me through an example I
> would
> > >>> be
> > >>>>> able to grok how to properly structure my data and query it in
> Neo4j.
> > >>>>>
> > >>>>> Nodes:
> > >>>>> Message( text: String )
> > >>>>> User( id: Long )
> > >>>>>
> > >>>>> Relationships:
> > >>>>> CREATED
> > >>>>> FOLLOWS
> > >>>>> IS_VISIBLE_BY
> > >>>>>
> > >>>>> So I might have a graph with entries like so:
> > >>>>>
> > >>>>> User(1) --> CREATED --> Message("i woke up late today")
> > >>>>> User(2) --> CREATED --> Message("hello")
> > >>>>> User(3) --> CREATED --> Message("ugh, i hate mondays")
> > >>>>>
> > >>>>> User(1) --> FOLLOWS --> User(2)
> > >>>>>
> > >>>>> Let's also say all messages are visible to User 1.
> > >>>>>
> > >>>>> Message("i woke up late today") --> IS_VISIBLE_BY --> User(1)
> > >>>>> Message("hello") --> IS_VISIBLE_BY --> User(1)
> > >>>>> Message("ugh, i hate mondays") --> IS_VISIBLE_BY --> User(1)
> > >>>>>
> > >>>>> So, I can do a simple traversal for visible:
> > >>>>>
> > >>>>> val graphDb = new EmbeddedGraphDatabase( "path/to/neo4j-db" )
> > >>>>> val index = new LuceneIndexService( graphDb )
> > >>>>> val viewer = index.getSingleNode("id", 1)
> > >>>>> val msgs = viewer.traverse( Order.BREADTH_FIRST,
> > >>>>> StopEvaluator.END_OF_GRAPH,
> > >>>>> ReturnableEvaluator.ALL_BUT_START_NODE, Rels.IS_VISIBLE_BY,
> > >>>>> Direction.INCOMING)
> > >>>>> msgs.toList.map(_.toJson).mkString("{ msgs : [", ",", "] }")  //
> > >>> assuming
> > >>>> i
> > >>>>> have the relevant functions
> > >>>>>
> > >>>>> But let's say that this is going to return too many messages.  Just
> > >>>> because
> > >>>>> all the messages are possibly visible to me, doesn't mean I want to
> > >> see
> > >>>>> them
> > >>>>> all.  So, I'd like to additionally filter by the FOLLOWS
> > >> relationship.
> > >>>> I'd
> > >>>>> like to express "get all messages that are visible and were created
> > >> by
> > >>> a
> > >>>>> user that I follow."  Can someone show me an example of how to do
> > >> that?
> > >>>>>
> > >>>>> I'm guessing that you need to implement a custom
> ReturnableEvaluator,
> > >>> but
> > >>>> I
> > >>>>> don't understand how you traverse multiple relationships at the
> same
> > >>>> time.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Lincoln
> > >>>>> _______________________________________________
> > >>>>> Neo mailing list
> > >>>>> User@lists.neo4j.org
> > >>>>> https://lists.neo4j.org/mailman/listinfo/user
> > >>>>>
> > >>>> _______________________________________________
> > >>>> Neo mailing list
> > >>>> User@lists.neo4j.org
> > >>>> https://lists.neo4j.org/mailman/listinfo/user
> > >>>>
> > >>> _______________________________________________
> > >>> Neo mailing list
> > >>> User@lists.neo4j.org
> > >>> https://lists.neo4j.org/mailman/listinfo/user
> > >>>
> > >> _______________________________________________
> > >> Neo mailing list
> > >> User@lists.neo4j.org
> > >> https://lists.neo4j.org/mailman/listinfo/user
> > >>
> > > _______________________________________________
> > > Neo mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> >
> > _______________________________________________
> > Neo mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> _________________________________________________________________
> Express yourself instantly with MSN Messenger! Download today it's FREE!
> http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] basic questions

Reply via email to