Re: [Neo4j] User Digest, Vol 51, Issue 96

Peter Neubauer Wed, 15 Jun 2011 13:16:15 -0700

Agelos,
we are just testing help.neo4j.org, you might try to start a
discussion there if that is more convenient? It's web based and should
read more like a forum if you like that style.


Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Wed, Jun 15, 2011 at 8:53 PM, Agelos Pikoulas
<agelos.pikou...@gmail.com> wrote:
> Dear all
>
> /****  Im' sorry if I cant use the user@lists properly, I am indeed lost :-(
> Neo4J would be so much better as a forum or a stackOverNeo4J  :-)***/
>
> Allow me to say, that the 50K magic number is not very useful for real &
> practical modern Social Network apps.
>
> What if there's simply a couple of million "Person" nodes that may "LIKE"
> the "Movie" nodes?
> And what if I have a few million of Movies and many million of Persons ?
> Its a typical case a "movie" having a few 100K rating/votes. And imagine if
> I have Song, Book & Product nodes!
>
> I think this issue is *MAJOR* and it needs to be promoted to a high priority
> to the neo4j team.
>
> The proxy solution sounds wonderful, but it can be quite a hassle if its not
> rightly encapsulated & transparent.
> I think all Traversals will become quite hacked & I can't even think what
> will happen to Object mapping etc.
> I imagine it COULD be part of an upcoming version of the new & amazing
> Spring Data Graph framework (check it out!),
> where a simple Annotation such as @NodeWithProxy along with information for
> what *RelationshipTypes / Directions
> *should go to the real or the proxy Node, could do all of the proxy magic!!!
>
> But, the *RelationshipType/Direction indexing *I proposed, I dare say, could
> be a more generic and cleaner idea, and also a quicker hack!
>
> All we need is a method TraversalDescription.*index("myIndex");* where we
> can declare which "index" should be used for looking up
> the (few) RelationshipTypes/Directions among the millions on the Node.
> The best thing is that we have already declared those on
> TraversalDescription.*relationships(*MyRelationshipType.hasPart,
> Direction.OUTGOING).
>
> The *Traversal *would then follow (only) those found on the index! Bingo!!!!
>
> We  could also have a *.followIndexedOnly(false) *and even
> *recreateFollowedIndexes(true)
> *to save us next time!
>
> In any case, something must be implemented!
>
> Without being an expert on neo4j, I think there is a lot of Indexing
> optimization needed yet!
>
> Michael what do you think ? Could you please see this being promoted to the
> team while sharing their views?
>
> Agelos
>
> Date: Wed, 15 Jun 2011 17:57:55 +0200
> From: "Balazs E. Pataki" <pat...@dsd.sztaki.hu>
> Subject: Re: [Neo4j] Slow Traversals on Nodes with too many
>       Relationships
> To: Neo4j user discussions <user@lists.neo4j.org>
> Message-ID: <4df8d683.8010...@dsd.sztaki.hu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi,
>
> when we started to evaluate neo4j we made some measurements and for us
> it seemed that 50.000 is a magical number: this many relationships and
> properties on one node seemed to be a limit, which once reached makes
> things slow. But we didn't actually need that much relationship/property
> in our case, so we could live with it, or could make workarounds (eg.
> storing things in properties and doing indexed lookups instead of using
> relationships)
>
> An automatic indexed lookup on relationship types and directions would
> be awsome, definitely.
>
> Regards,
> ---
> balazs
>
>
>
> Date: Wed, 15 Jun 2011 23:19:32 +0800
> From: Craig Taverner <cr...@amanzi.com>
> Subject: Re: [Neo4j] Slow Traversals on Nodes with too many
>       Relationships
> To: Neo4j user discussions <user@lists.neo4j.org>
> Message-ID: <banlktine_mk5+9damh07tsrq6nnxifo...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Could this also be related to the possibility that in order to determine
> relationship type and direction, the relationships need to be loaded from
> disk? If so, then having a large number of relationships on the same node
> would decrease performance, if the number was large enough to affect the
> disk io caching.
>
> If this is the case, perhaps adding a proxy node for the incoming
> relationships would work-around the problem? Of course then you have doubled
> the number of part nodes (two for each part, one part and one containers
> proxy).
>
>
> Date: Wed, 15 Jun 2011 18:40:05 +0300
> From: Agelos Pikoulas <agelos.pikou...@gmail.com>
> Subject: Re: [Neo4j] Slow Traversals on Nodes with too many
>       Relationships
> To: user@lists.neo4j.org
> Message-ID: <banlktinsmadbatf1rglaj4wqngkfekj...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Re: [Neo4j] Slow Traversals on Nodes with too many
>      Relationships
>
> I have to respectfully agree with Rick Bullotta.
>
> I was suspecting the big-O is not linear for this case.
>
> To verify I added x4 Container nodes (400.000) and their appropriate
> Relationships, and it is now *unbelievably* slow :
>  It does not take x4 more, but it takes more than 30-40 seconds for each
> next() Remind you 100K nodes = ~2secs for each next() !!!
>
> And only to make matters worse, the subsequent runs weren't fast either -
> they actually took more time than the first
> (1st TotalTraversalTime= 389936ms, 2nd TotalTraversalTime= 443948ms)
>
> The whole setup is running on
> Eclipse 3.6, with -Xmx512m on JavaVM,
> Windows2003 VMWare machine with 4GB, running on a fast 2nd gen SSD (OCZ
> Vertex 2). The neo4J data resides on this SSD.
> The 100.000 nodes data files were ~250MB, the 400.000 one is ~1GB.
>
> I wonder what would happen if the Container nodes were a few million (which
> will be my case) - it will run forever.
>
> Could you please looking into my suggestion - i.e "Using a 'smart' behind
> the scenes Indexing on both *RelationshipType* and *Direction* that
> Traversals actually use to boost things up" ?
>
> To another topic, how does one use this mailing list - I use it through
> gmail and I am utterly lost - is there a better client/UI to actually
> post/reply into threads ?
>
>
> ------------------------------
>
> Message: 1
> Date: Wed, 15 Jun 2011 07:27:26 -0700
> From: Rick Bullotta <rick.bullo...@thingworx.com>
> Subject: Re: [Neo4j] Slow Traversals on Nodes with too many
>       Relationships
> To: Neo4j user discussions <user@lists.neo4j.org>
> Message-ID:
>       <
> 09df3402c845ec489a3323a06208f20d0a9d4...@p3pw5ex1mb14.ex1.secureserver.net>
>
> Content-Type: text/plain; charset="us-ascii"
>
> I would respectfully disagree that it doesn't necessarily represent
> production usage, since in some cases, each query/traversal will be unique
> and isolated to a part of a subgraph, so in some cases, a "cold" query may
> be the norm....
>
> -----Original Message-----
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On
> Behalf Of Michael Hunger
> Sent: Wednesday, June 15, 2011 10:25 AM
> To: Neo4j user discussions
> Subject: Re: [Neo4j] Slow Traversals on Nodes with too many Relationships
>
> That is rather a case of warming up your caches.
>
> Determining the traversal speed from the first run is not a good benchmark
> as it doesn't represent production usage :)
> The same (warming up) is true for all kinds of benchmarks (except for
> startup performance benchmarks).
>
> Cheers
>
> Michael
>
> Am 15.06.2011 um 14:48 schrieb Agelos Pikoulas:
>
>> I have a few "Part" nodes related with each via "HASPART"
>> relationship/edges.
>> (eg Part1---HASPART--->Part2---
> HASPART--->Part3 etc) .
>> TraversalDescription works fine, following each Part's outgoing HASPART
>> relationship.
>>
>> Then I add a large number (say 100.000) of "Container" Nodes, where each
>> "Container" has a "CONTAINS" relation to almost *every* "Part" node.
>> Hence each Part node now has a 100.000 incoming CONTAINS relationships
> from
>> Container nodes,
>> but only a few outgoing HASPART relationships to other Part nodes.
>>
>> Now my previous TraversalDescription run extremely slow (several seconds
>> inside each Iterator<Path>.next() call)
>> Note that I do define relationships(RT.HASPART, Direction.OUTGOING) on the
>> TraversalDescription,
>> but it seems its not used by neo4j as a hint. Note that on a subsequent
> run
>> of the same Traversal, its very quick indeed.
>>
>> Is there any way to use Indexing on relationships for such a scenario, to
>> boost things up ?
>>
>> Ideally, the Traversal framework could use automatic/declerative indexing
> on
>> Node Relationship types and/or direction to perform such traversals
> quicker.
>>
>> Regards
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] User Digest, Vol 51, Issue 96

Reply via email to