Hi All, 

Is there any performance impact when I use collectAsMap on my RDD instead of
rdd.collect().toMap ?

I have a key value rdd and I want to convert to HashMap as far I know
collect() is not efficient on large data sets as it runs on driver can I use
collectAsMap instead is there any performance impact ?

Original:-
 val QuoteHashMap=QuoteRDD.collect().toMap
 val QuoteRDDData=QuoteHashMap.values.toSeq
 val QuoteRDDSet=sc.parallelize(QuoteRDDData.map(x =>
x.toString.replace("(","").replace(")","")))
 QuoteRDDSet.saveAsTextFile(Quotepath)

Change:-
 val QuoteHashMap=QuoteRDD.collectAsMap()
 val QuoteRDDData=QuoteHashMap.values.toSeq
 val QuoteRDDSet=sc.parallelize(QuoteRDDData.map(x =>
x.toString.replace("(","").replace(")","")))
 QuoteRDDSet.saveAsTextFile(Quotepath)



Thanks
Sri 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/difference-between-rdd-collect-toMap-to-rdd-collectAsMap-tp25139.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to