Re: How could I do this algorithm in Spark?

Robin East Thu, 25 Feb 2016 03:33:17 -0800

The structures you are describing look like edges of a graph and you want to 
follow the graph to a terminal vertex and then propagate that value back up the 
path. On this assumption it would be simple to create the structures as graphs 
in GraphX and use Pregel for the algorithm implementation.
-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action 
<http://www.manning.com/books/spark-graphx-in-action>






> On 25 Feb 2016, at 10:52, Guillermo Ortiz <konstt2...@gmail.com> wrote:
> 
> Oh, the letters were just an example, it could be:
> a , t
> b, o
> t, k
> k, c
> 
> So.. a -> t -> k -> c and the result is: a,c; t,c; k,c and b,o
> I don't know if you were thinking about sortBy because the another example 
> where letter were consecutive.
> 
> 
> 2016-02-25 9:42 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com 
> <mailto:konstt2...@gmail.com>>:
> I don't see that sorting the data helps. 
> The answer has to be all the associations. In this case the answer has to be:
> a , b --> it was a error in the question, sorry. 
> b , d
> c , d
> x , y
> y , y
>  
> I feel like all the data which is associate should be in the same executor.
> On this case if I order the inputs.
> a , b
> x , y
> b , c
> y , y
> c , d
> --> to
> a , b
> b , c
> c , d
> x , y
> y , y
> 
> Now, a,b ; b,c; one partitions for example, "c,d" and "x,y" another one and 
> so on. 
> I could get the relation between "a,b,c", but not about "d" with "a,b,c", am 
> I wrong? I hope to be wrong!. 
> 
> It seems that it could be done with GraphX, but as you said, it seems a 
> little bit overhead. 
> 
> 
> 2016-02-25 5:43 GMT+01:00 James Barney <jamesbarne...@gmail.com 
> <mailto:jamesbarne...@gmail.com>>:
> Guillermo, 
> I think you're after an associative algorithm where A is ultimately 
> associated with D, correct? Jakob would correct if that is a typo--a sort 
> would be all that is necessary in that case. 
> 
> I believe you're looking for something else though, if I understand correctly.
> 
> This seems like a similar algorithm to PageRank, no? 
> https://github.com/amplab/graphx/blob/master/python/examples/pagerank.py 
> <https://github.com/amplab/graphx/blob/master/python/examples/pagerank.py> 
> Except return the "neighbor" itself, not the necessarily the rank of the page.
> 
> If you wanted to, use Scala and Graphx for this problem. Might be a bit of 
> overhead though: Construct a node for each member of each tuple with an edge 
> between. Then traverse the graph for all sets of nodes that are connected. 
> That result set would quickly explode in size, but you could restrict results 
> to a minimum N connections. I'm not super familiar with Graphx myself, 
> however. My intuition is saying 'graph problem' though.
> 
> Thoughts?
> 
> 
> On Wed, Feb 24, 2016 at 6:43 PM, Jakob Odersky <ja...@odersky.com 
> <mailto:ja...@odersky.com>> wrote:
> Hi Guillermo,
> assuming that the first "a,b" is a typo and you actually meant "a,d",
> this is a sorting problem.
> 
> You could easily model your data as an RDD or tuples (or as a
> dataframe/set) and use the sortBy (or orderBy for dataframe/sets)
> methods.
> 
> best,
> --Jakob
> 
> On Wed, Feb 24, 2016 at 2:26 PM, Guillermo Ortiz <konstt2...@gmail.com 
> <mailto:konstt2...@gmail.com>> wrote:
> > I want to do some algorithm in Spark.. I know how to do it in a single
> > machine where all data are together, but I don't know a good way to do it in
> > Spark.
> >
> > If someone has an idea..
> > I have some data like this
> > a , b
> > x , y
> > b , c
> > y , y
> > c , d
> >
> > I want something like:
> > a , d
> > b , d
> > c , d
> > x , y
> > y , y
> >
> > I need to know that a->b->c->d, so a->d, b->d and c->d.
> > I don't want the code, just an idea how I could deal with it.
> >
> > Any idea?
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
> 
> 
>

Re: How could I do this algorithm in Spark?

Reply via email to