Joining so many external tables is always an issue with any component. Your 
problem is not Hive specific; but your data model seems to be messed up. First 
of all you should have them in an appropriate format, such as ORC or parquet 
and the tables should not be external. Then you should use the right data types 
for columns, eg an int instead of a varchar if you have just numbers in a 
column. After that check if you can prejoin and store the data in one big flat 
table and do queries on that.

Then you should look at the min / max indexes , bloom filters, statistics, 
partitions etc. 

Maybe you can post more details about data model and queries. 

> On 24 Mar 2016, at 02:49, Sanka, Himabindu <himabindu_sa...@optum.com> wrote:
> 
> Hi Team,
>  
> I need some inputs from you. I have a requirement for my project where I have 
> to join 21 hive external tables.
>  
> Out of which 6 tables are HUGE  having 500 million records of data. Other 15 
> tables are smaller ones around 100 to 1000 records each.
>  
> When I am doing inner joins/ left outer joins its taking hours to run the 
> query.
>  
> Please let me know some optimization techniques or any other eco system 
> components that performs better than HIVE.
>  
>  
> Regards,
> Hima
>  
>  
> 
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.

Reply via email to