Hi All, Do you think, replacing the collect() (for having scala collection for loop) with below codeblock will have any benefit?
cachedColumnsAddTableDF.select("reporting_table").distinct().foreach(r => { r.getAs("reporting_table").asInstanceOf[String] }) On Wed, May 5, 2021 at 10:15 PM Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Hi All, Collect in spark is taking huge time. I want to get list of values > of one column to Scala collection. How can I do this? > val newDynamicFieldTablesDF = cachedPhoenixAppMetaDataForCreateTableDF > .select(col("reporting_table")).except(clientSchemaDF) > logger.info(s"####### except with client-schema done " + > LocalDateTime.now()) > // newDynamicFieldTablesDF.cache() > > > val detailsForCreateTableDF = > cachedPhoenixAppMetaDataForCreateTableDF > .join(broadcast(newDynamicFieldTablesDF), > Seq("reporting_table"), "inner") > logger.info(s"####### join with newDF done " + > LocalDateTime.now()) > > // detailsForCreateTableDF.cache() > > val newDynamicFieldTablesList = newDynamicFieldTablesDF.map(r > => r.getString(0)).collect().toSet > > > Later, I am iterating this list for one the use case to create a custom > definition table: > > newDynamicFieldTablesList.foreach(table => { // running here Create table > DDL/SQL query }) > > Thank you so much >