Hi team,

I am trying to learn the CBO of hive because I need to make some performance 
tuning for my ETL job.


I find a confluence doc below, but I am not sure if it is the newest version, 
can anyone help to confirm that?
https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive


Another question is that we develop some UDTF help us to parse log like:
select 
my-udtf(log) as (id
,name
,time) 
from tb_log
So do you have any other better idea for this scenario? 


BTW, the version of Hive we used is above 3.0. My data increase by PB every day.




Thanks in advance,

Samuel





 





 

Reply via email to