Re: HIVE:1.2, Query taking huge time

2015-08-21 Thread Nishant Aggarwal
Thanks All. I will implement the suggested points and share the output. Thanks again for all the help. Thanks and Regards Nishant Aggarwal, PMP Cell No:- +91 99588 94305 On Fri, Aug 21, 2015 at 10:33 AM, Jörn Franke wrote: > Additionally, although it is a PoC you should have a realistic data m

Re: HIVE:1.2, Query taking huge time

2015-08-20 Thread Jörn Franke
Additionally, although it is a PoC you should have a realistic data model. Furthermore, following good data modeling practices should be taken into account. Joining on a double is not one of them. It should be int. Furthermore, double is a type that is in most scenarios rarely used. In the business

Re: HIVE:1.2, Query taking huge time

2015-08-20 Thread Xuefu Zhang
Please check out HIVE-11502. For your poc, you can simply get around using other data types instead of double. On Thu, Aug 20, 2015 at 2:08 AM, Nishant Aggarwal wrote: > Thanks for the reply Noam. I have already tried the later point of > dividing the query. But the challenge comes during the jo

Re: HIVE:1.2, Query taking huge time

2015-08-20 Thread Nishant Aggarwal
Thanks for the reply Noam. I have already tried the later point of dividing the query. But the challenge comes during the joining of the table. Thanks and Regards Nishant Aggarwal, PMP Cell No:- +91 99588 94305 On Thu, Aug 20, 2015 at 2:19 PM, Noam Hasson wrote: > Hi, > > Have you look at cou

Re: HIVE:1.2, Query taking huge time

2015-08-20 Thread Noam Hasson
Hi, Have you look at counters in Hadoop side? It's possible you are dealing with a bad join which causes multiplication of items, if you see huge number of record input/output in map/reduce phase and keeps increasing that's probably the case. Another thing I would try is to divide the job into se

HIVE:1.2, Query taking huge time

2015-08-20 Thread Nishant Aggarwal
Dear Hive Users, I am in process of running over a poc to one of my customer demonstrating the huge performance benefits of Hadoop BigData using Hive. Following is the problem statement i am stuck with. I have generate a large table with 28 columns( all are double). Table size on disk is 70GB (i