Hi Guys, Does any one have detailed descriptions for hive parameters in spark? like spark.sql.hive.exec.dynamic.partition I couldn't find any reference in my spark 2.3.2 configuration.
I'm looking into a problem that Spark cannot understand Hive partition at all. In my Hive table it is partitioned by 1,000; however when I read the same table in spark in RDD it becomes 105 if I query as df.rdd.getNumPartitions() Because create 1 task per partition when read, the reading is painfully slow as 1 task reading many Hive folders in sequential order. My target is spin up more tasks that increase parallelism during read operations. Hope this makes sense. Thank you Best Regards, Mike