Hi,

I am using issue while saving the dataframe back to HDFS. It's taking long
time to run.

val results_dataframe = sqlContext.sql("select gt.*,ct.* from
PredictTempTable pt,ClusterTempTable ct,GamificationTempTable gt where
gt.vin=pt.vin and pt.cluster=ct.cluster")
results_dataframe.coalesce(numPartitions)
results_dataframe.persist(StorageLevel.MEMORY_AND_DISK)

dataFrame.write.mode(saveMode).format(format)
  .option(Codec, compressCodec) //"org.apache.hadoop.io.compress.snappyCodec"
  .save(outputPath)

It was taking long time and total number of records for  this
dataframe is 4903764

I even increased number of partitions from 10 to 20, still no luck.
Can anyone help me in resolving this performance issue

Thanks,

Asmath

Reply via email to