RE: RDD.join vs spark SQL join

2015-08-15 Thread Xiao JIANG
Thank you Akhil! Date: Fri, 14 Aug 2015 14:51:56 +0530 Subject: Re: RDD.join vs spark SQL join From: ak...@sigmoidanalytics.com To: jiangxia...@outlook.com CC: user@spark.apache.org Both works the same way, but with SparkSQL you will get the optimization etc done by the catalyst. One important

Re: RDD.join vs spark SQL join

2015-08-14 Thread Akhil Das
Both works the same way, but with SparkSQL you will get the optimization etc done by the catalyst. One important thing to consider is the # partitions and the key distribution (when you are doing RDD.join), If the keys are not evenly distributed across machines then you can see the process