Disabling hive.limit.optimize.enable make situation even worse - TEZ job scans
all files in partition which is unnecessary. I've run debugger and
discovered that partition size in my table obtained from metastore exceed
hive.fetch.task.conversion.threshold. It seems that since hive-1.0.1
optimization process changed a lot - earlier version used estimator to get data
size for the job, now hive use stats from metastore if they are available.
SimpleFetchOpitimizer is unaware of GlobalLimitOptimizer and input data size
reduction through partition prunning. Solution for me is to tune the
threshold and maybe fill RFE to change SimpleFetchOptimizer behaviour in case
of GlobalLimitOptimizer qualified the optimize that reduces input size for
limit. Regards Wojtek Dnia 06 lipca 2023 18:07 Okumin < m...@okumin.com
> napisał(a): Hi Wojtek, I tried to submit the query with the given
configurations on Hive 4.0.0-alpha-2 on Tez on YARN. In my environment, the
query is converted to a single fetch task. Could you please give us the
precise revision of Hive, your table definition, the amount of data, and so on?
Also, I'm curious what if you disable `hive.limit.optimize.enable`.
Regards, Okumin On Wed, Jul 5, 2023 at 9:40 PM Wojtek Meler <
wme...@wp.pl > wrote: Hi, after switching to Hive 4.0 and Tez on yarn
I've noticed that simple fetch queries run much longer. I have following
configuration: hive.fetch.task.conversion=more
hive.fetch.task.conversion.threshold=1073741824
hive.limit.optimize.enable=true hive.limit.optimize.fetch.max=50000
hive.limit.optimize.limit.file=10 hive.limit.pushdown.memory.usage=0.1
hive.limit.row.max.size=100000 and query select * from tbl limit 100 runs
on Tez containers instead of being run internaly inside hiveserver2. How to
configure fetch task conversion properly as even clicking on Hue has bad
experience with data previews run on Tez on yarn... Regards, Wojtek