from:"Ankur Dave"

Re: Two joins in GraphX Pregel implementation

2015-07-28 Thread Ankur Dave

On 27 Jul 2015, at 16:42, Ulanov, Alexander wrote: > It seems that the mentioned two joins can be rewritten as one outer join You're right. In fact, the outer join can be streamlined further using a method from GraphOps: g = g.joinVertices(messages)(vprog).cache() Then, instead of passing new

Re: GraphX: New graph operator

2015-06-01 Thread Ankur Dave

I think it would be good to have more basic operators like union or difference, as long as they have an efficient distributed implementation and are plausibly useful. If they can be written in terms of the existing GraphX API, it would be best to put them into GraphOps to keep the core GraphX impl

Re: GraphX implementation of ALS?

2015-05-26 Thread Ankur Dave

This is the latest GraphX-based ALS implementation that I'm aware of: https://github.com/ankurdave/spark/blob/GraphXALS/graphx/src/main/scala/org/apache/spark/graphx/lib/ALS.scala When I benchmarked it last year, it was about twice as slow as MLlib's ALS, and I think the latter has gotten faster s

Re: GraphX vertex partition/location strategy

2015-01-19 Thread Ankur Dave

No - the vertices are hash-partitioned onto workers independently of the edges. It would be nice for each vertex to be on the worker with the most adjacent edges, but we haven't done this yet since it would add a lot of complexity to avoid load imbalance while reducing the overall communication by

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Ankur Dave

+1 (binding) Ankur On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia wrote: > I'd like to formally call a [VOTE] on this model, to last 72 hours. The > [VOTE] will end on Nov 8, 2014 at 6 PM PST. >

Re: GraphX: some vertex with specific edge

2014-09-16 Thread Ankur Dave

At 2014-09-16 00:07:34 -0700, sochi wrote: > so, above example is like a ---(e1)---> b ---(e1)---> c ---(e1)---> d > > In this case, can I find b,c and d when I have just src vertex, a and edge, > e1? First, to clarify: the three edges in your example are all distinct, since they have different

Re: PARSING_ERROR from kryo

2014-09-15 Thread Ankur Dave

At 2014-09-15 08:59:48 -0700, Andrew Ash wrote: > I'm seeing the same exception now on the Spark 1.1.0 release. Did you ever > get this figured out? > > [...] > > On Thu, Aug 21, 2014 at 2:14 PM, npanj wrote: >> I am getting PARSING_ERROR while running my job on the code checked out up >> to com

Re: Graphx seems to be broken while Creating a large graph(6B nodes in my case)

2014-08-25 Thread Ankur Dave

I posted the fix on the JIRA ticket (https://issues.apache.org/jira/browse/SPARK-3190). To update the user list, this is indeed an integer overflow problem when summing up the partition sizes. The fix is to use Longs for the sum: https://github.com/apache/spark/pull/2106. Ankur --

Re: VertexPartition and ShippableVertexPartition

2014-07-28 Thread Ankur Dave

On Mon, Jul 28, 2014 at 4:29 AM, Larry Xiao wrote: > On 7/28/14, 3:41 PM, shijiaxin wrote: > >> There is a VertexPartition in the EdgePartition,which is created by >> >> EdgePartitionBuilder.toEdgePartition. >> >> and There is also a ShippableVertexPartition in the VertexRDD. >> >> These two Part

Re: GraphX graph partitioning strategy

2014-07-25 Thread Ankur Dave

Oops, the code should be: val unpartitionedGraph: Graph[Int, Int] = ...val numPartitions: Int = 128 def getTripletPartition(e: EdgeTriplet[Int, Int]): PartitionID = ... // Get the triplets using GraphX, then use Spark to repartition themval partitionedEdges = unpartitionedGraph.triplets .map(e =

Re: GraphX graph partitioning strategy

2014-07-25 Thread Ankur Dave

Hi Larry, GraphX's graph constructor leaves the edges in their original partitions by default. To support arbitrary multipass graph partitioning, one idea is to take advantage of that by partitioning the graph externally to GraphX (though possibly using information from GraphX such as the degrees)

Re: GraphX can not unpersist edges of old graph?

2014-06-12 Thread Ankur Dave

We didn't provide an unpersist API for Graph because the internal dependency structure of a graph can make it hard to unpersist correctly in a way that avoids recomputation. However, you can directly unpersist a graph's vertices and edges RDDs using graph.vertices.unpersist() and graph.edges.unpers

Re: Suggestion: rdd.compute()

2014-06-10 Thread Ankur Dave

You can achieve an equivalent effect by calling rdd.foreach(x => {}), which is the lightest possible action that forces materialization of the whole RDD. Ankur