Re: [Neo4j] Algorithms Best Practices

Marko Rodriguez Wed, 09 Jun 2010 08:25:27 -0700

Hi,

> -          Is it reasonable to add these metrics as properties of the nodes?


        Yes. However, depending on the algorithm, you will have to recompute 
these values every time the graph changes.

> -          Can these metrics be maintained in the database over time, or 
> should they be calculated as needed?

        Depends on what you want to do. In certain domains, algorithms are 
calculated yearly (e.g. in the scholarly community, the impact factor is 
calculated once a year).
                see slides 24+ in 
http://www.slideshare.net/slidarko/a-practical-ontology-for-the-largescale-modeling-of-scholarly-artifacts-and-their-usage-3879791

> -          Does the calculation of a metric for a single node require 
> traversing the entire graph (or at least the sub-graph it is connected to)? 
> Does it depend on the metric being calculated?

        Depends on your algorithm. There is a distinction between global and 
local (aka priors/constrained) algorithms.

> -          If the answer is, yes - it take a long time to update a set of 
> metrics, what are the typical solutions? Do we go down a path like we do with 
> data warehousing where the graph is loaded from the operational store 
> periodically in batches, and then becomes stale over time? What might be some 
> solutions for graphs that are constantly updated - or is the tradeoff simply 
> that to have metrics your entire graph must be updated after any update for 
> the metrics to be valid? (For example - can a node be time stamped or 
> something, or is it the case that any change to the graph can change the 
> metrics for every other node?)

        I believe the trick to large-scale graph processing is to not do it at 
all. Classic graph/network algorithms based on global analyses are intractable 
in most situations. Always formulate the solution to your problem in terms of 
local (or constrained) traversals. All classic graph algorithms can be 
formulated in this way. see http://www.datalab.uci.edu/papers/white_smyth.pdf 
... Given White's work, think of all your algorithms in terms of 
generalizations of this model. For multi-relational graphs, you can effect such 
behavior with the path algebra [ http://arxiv.org/abs/0806.2274 ] and for 
statistical methods, grammar-based random walkers [ 
http://arxiv.org/abs/0803.4355 ].

Take care,
Marko.

http://markorodriguez.com
http://tinkerpop.com


_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Algorithms Best Practices

Reply via email to