Disabling hive.limit.optimize.enable make situation even worse - TEZ job scans 
all files in partition which is unnecessary.   I've run debugger and 
discovered that partition size in my table obtained from metastore  exceed   
hive.fetch.task.conversion.threshold.  It seems that since hive-1.0.1 
optimization process changed a lot - earlier version used estimator to get data 
size for the job, now hive use stats from metastore if they are available.  
SimpleFetchOpitimizer is unaware of GlobalLimitOptimizer and input data size 
reduction through partition prunning.   Solution for me is to tune the 
threshold and maybe fill RFE to change SimpleFetchOptimizer behaviour in case 
of GlobalLimitOptimizer qualified the optimize that reduces input size for 
limit.  Regards  Wojtek  Dnia 06 lipca 2023 18:07 Okumin <  m...@okumin.com 
> napisał(a):  Hi Wojtek,  I tried to submit the query with the given 
configurations on Hive 4.0.0-alpha-2 on Tez on YARN. In my environment, the 
query is converted to a single fetch task.  Could you please give us the 
precise revision of Hive, your table definition, the amount of data, and so on? 
Also, I'm curious what if you disable `hive.limit.optimize.enable`.  
Regards,  Okumin  On Wed, Jul 5, 2023 at 9:40 PM Wojtek Meler <  
wme...@wp.pl > wrote:  Hi, after switching to Hive 4.0 and Tez on yarn 
I've noticed that simple fetch queries run much longer.  I have following 
configuration:  hive.fetch.task.conversion=more  
hive.fetch.task.conversion.threshold=1073741824  
hive.limit.optimize.enable=true  hive.limit.optimize.fetch.max=50000  
hive.limit.optimize.limit.file=10  hive.limit.pushdown.memory.usage=0.1  
hive.limit.row.max.size=100000  and query  select * from tbl limit 100   runs 
on Tez containers instead of being run internaly inside hiveserver2.  How to 
configure fetch task conversion properly as even clicking on Hue has bad 
experience with data previews run on Tez on yarn...  Regards,  Wojtek

Reply via email to