Hi Wojtek,
Thanks for explaining the detail. I understand you have a larger amount of
data than `hive.fetch.task.conversion.threshold`.
Taking a glance, SimpleFetchOptimizer is likely to respect LIMIT if
`hive.fetch.task.caching` is disabled and all predicates are for partition
pruning. The case
Disabling hive.limit.optimize.enable make situation even worse - TEZ job scans
all files in partition which is unnecessary. I've run debugger and
discovered that partition size in my table obtained from metastore exceed
hive.fetch.task.conversion.threshold. It seems that since hive-1.0.1
Hi Wojtek,
I tried to submit the query with the given configurations on
Hive 4.0.0-alpha-2 on Tez on YARN. In my environment, the query is
converted to a single fetch task.
Could you please give us the precise revision of Hive, your table
definition, the amount of data, and so on? Also, I'm curiou
Hi, after switching to Hive 4.0 and Tez on yarn I've noticed that simple
fetch queries run much longer. I have following configuration:
hive.fetch.task.conversion=more
hive.fetch.task.conversion.threshold=1073741824
hive.limit.optimize.enable=true hive.limit.optimize.fetch.max=5
hiv