Hi Spark users, I've got an issue where I wrote a filter on a Hive table using dataframes and despite setting: spark.sql.hive.metastorePartitionPruning=true no partitions are being pruned.
In short: Doing this: table.filter("partition=x or partition=y") will result in Spark fetching all partition metadata from the Hive metastore and doing the filtering after fetching the partitions. On the other hand if my filter is "simple": table.filter("partition=x ") Spark does a call to the metastore that passes along the filter and fetches just the ones it needs. Our case is where we have a lot of partitions on a table and the calls that result in all the partitions take minutes as well as causing us memory issues. Is this a bug or is there a better way of doing the filter call? Thanks, Patrick PS: Sorry for crossposting I wasn't sure if the user list was the correct place to ask and I understood to go via stackoverflow first so my question is also here in more detail: https://stackoverflow.com/questions/46152526/how-should-i-configure-spark-to-correctly-prune-hive-metastore-partitions