Thanks a lot for your quick help ! Further, I have 2 more points:
a) I heard from my colleagues that if my Scala code had RDD then I need to replace with datasets / dataframes. Why is that ? b) One of the operator saveasTextFile is taking a long time. What would be the probable cause and alternates to it ? On Fri, Jun 9, 2017 at 9:19 PM, Gerard Maas <gerard.m...@gmail.com> wrote: > also, read the newest book of Holden on High-Performance Spark: > > http://shop.oreilly.com/product/0636920046967.do > > On Fri, Jun 9, 2017 at 5:38 PM, Alonso Isidoro Roman <alons...@gmail.com> > wrote: > >> a quick search on google: >> >> https://www.cloudera.com/documentation/enterprise/5-9-x/ >> topics/admin_spark_tuning.html >> >> https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apac >> he-spark-jobs-part-1/ >> >> http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apach >> e-spark-jobs-part-2/ >> >> and of course, Jacek`s >> <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tuning.html> >> >> >> >> Alonso Isidoro Roman >> [image: https://]about.me/alonso.isidoro.roman >> >> <https://about.me/alonso.isidoro.roman?promo=email_sig&utm_source=email_sig&utm_medium=email_sig&utm_campaign=external_links> >> >> 2017-06-09 14:50 GMT+02:00 Debabrata Ghosh <mailford...@gmail.com>: >> >>> Hi, >>> I need some help / guidance in performance tuning >>> Spark code written in Scala. Can you please help. >>> >>> Thanks >>> >> >> >