Hi all, I am working with spark 1.0.0. mainly for the usage of GraphX and wished to apply some custom partitioning strategies on the edge list of the graph. I have generated an edge list file which has the partition number after the source and destination id in each line. Initially I am loading the unannotated graph using GraphLoader and then loading the annotated file and applying
val unpartitionedGraph = GraphLoader.edgeListFile(sc, fname, minEdgePartitions = numEPart).cache() val graph = Graph(unpartitionedGraph.vertices, partitionCustom(unpartitionedGraph.edges)) // The above method is workaround for spark 1.0.0 def partitionCustom[ED](edges: RDD[Edge[ED]]): RDD[Edge[ED]] = { edges.map(e => (customPartition(e.srcId, e.dstId), e)) .partitionBy(new HashPartitioner(numPartitions)) .mapPartitions(_.map(_._2), preservesPartitioning = true) } def customPartition(src: VertexId, dst: VertexId): PartitionID = { // search for the src and dest line in the loaded annotated file // read the third element of that line and return it } But this method is inefficient as it requires to load the same data multiple times and also slow as I am performing a large number of searches on really huge edge list files. Please suggest some efficient ways of doing this. Also please note that I am stuck with spark 1.0.0 as I am only a user of the cluster available. Regards, Arpit Kumar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-edge-partitioning-in-graphX-tp22269.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org