Thanks All.
I will implement the suggested points and share the output.
Thanks again for all the help.
Thanks and Regards
Nishant Aggarwal, PMP
Cell No:- +91 99588 94305
On Fri, Aug 21, 2015 at 10:33 AM, Jörn Franke wrote:
> Additionally, although it is a PoC you should have a realistic data m
Additionally, although it is a PoC you should have a realistic data model.
Furthermore, following good data modeling practices should be taken into
account. Joining on a double is not one of them. It should be int.
Furthermore, double is a type that is in most scenarios rarely used. In the
business
Please check out HIVE-11502. For your poc, you can simply get around using
other data types instead of double.
On Thu, Aug 20, 2015 at 2:08 AM, Nishant Aggarwal
wrote:
> Thanks for the reply Noam. I have already tried the later point of
> dividing the query. But the challenge comes during the jo
Thanks for the reply Noam. I have already tried the later point of dividing
the query. But the challenge comes during the joining of the table.
Thanks and Regards
Nishant Aggarwal, PMP
Cell No:- +91 99588 94305
On Thu, Aug 20, 2015 at 2:19 PM, Noam Hasson
wrote:
> Hi,
>
> Have you look at cou
Hi,
Have you look at counters in Hadoop side? It's possible you are dealing
with a bad join which causes multiplication of items, if you see huge
number of record input/output in map/reduce phase and keeps increasing
that's probably the case.
Another thing I would try is to divide the job into se
Dear Hive Users,
I am in process of running over a poc to one of my customer demonstrating
the huge performance benefits of Hadoop BigData using Hive.
Following is the problem statement i am stuck with.
I have generate a large table with 28 columns( all are double). Table size
on disk is 70GB (i