groupByKey has the numPartitions parameter, you can set it to determine the
partition num. if not set, the generated RDD has the same partition num of
the previous one


Joe L wrote
> I want to apply the following transformations to 60Gbyte data on 7nodes
> with 10Gbyte memory. And I am wondering if groupByKey() function returns a
> RDD with a single partition for each key? if so, what will happen if the
> size of the partition doesn't fit into that particular node? 
> 
> rdd = sc.textFile("hdfs//.....").map(parserFunc).groupByKey()





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/groupByKey-returns-a-single-partition-in-a-RDD-tp4264p4307.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to