Re: Performance Improvement: Collect in spark taking huge time

2021-05-05 Thread Chetan Khatri
Hi All, Do you think, replacing the collect() (for having scala collection for loop) with below codeblock will have any benefit? cachedColumnsAddTableDF.select("reporting_table").distinct().foreach(r => { r.getAs("reporting_table").asInstanceOf[String] }) On Wed, May 5, 2021 at 10:15 PM Cheta

Performance Improvement: Collect in spark taking huge time

2021-05-05 Thread Chetan Khatri
Hi All, Collect in spark is taking huge time. I want to get list of values of one column to Scala collection. How can I do this? val newDynamicFieldTablesDF = cachedPhoenixAppMetaDataForCreateTableDF .select(col("reporting_table")).except(clientSchemaDF) logger.info(s"###