Hi All, Is there any performance impact when I use collectAsMap on my RDD instead of rdd.collect().toMap ?
I have a key value rdd and I want to convert to HashMap as far I know collect() is not efficient on large data sets as it runs on driver can I use collectAsMap instead is there any performance impact ? Original:- val QuoteHashMap=QuoteRDD.collect().toMap val QuoteRDDData=QuoteHashMap.values.toSeq val QuoteRDDSet=sc.parallelize(QuoteRDDData.map(x => x.toString.replace("(","").replace(")",""))) QuoteRDDSet.saveAsTextFile(Quotepath) Change:- val QuoteHashMap=QuoteRDD.collectAsMap() val QuoteRDDData=QuoteHashMap.values.toSeq val QuoteRDDSet=sc.parallelize(QuoteRDDData.map(x => x.toString.replace("(","").replace(")",""))) QuoteRDDSet.saveAsTextFile(Quotepath) Thanks Sri -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/difference-between-rdd-collect-toMap-to-rdd-collectAsMap-tp25139.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org