Hi Guys,

Does any one have detailed descriptions for hive parameters in spark? like
spark.sql.hive.exec.dynamic.partition I couldn't find any reference in my
spark 2.3.2 configuration.

I'm looking into a problem that Spark cannot understand Hive partition at
all. In my Hive table it is partitioned by 1,000; however when I read the
same table in spark in RDD it becomes 105 if I query as
df.rdd.getNumPartitions()

Because create 1 task per partition when read, the reading is painfully
slow as 1 task reading many Hive folders in sequential order. My target is
spin up more tasks that increase parallelism during read operations. Hope
this makes sense.

Thank you

Best Regards,
Mike

Reply via email to