Another dimension,

Try storing Hive table in ORC format. From my experience, it significantly
improves the performance compare to other formats.

Since you mentioned about join queries, on a side note, as a long term
goal, you probably want to explore Hive with Tez.

--Bala G.


On Fri, May 30, 2014 at 3:59 PM, kulkarni.swar...@gmail.com <
kulkarni.swar...@gmail.com> wrote:

> > It has innumerable no of joins. Since its client specific query, u
> understand I cannot share. Sorry about that
>
> Like I said, Joins are slow and in not done correctly could have terrible
> performance. A couple of handy techniques depend on how exactly are you
> trying to perform the join. For instance, if you are trying to join a
> smaller table to a larger one, a map join could work well for you where the
> smaller table is kept in-memory when the join is performed. Also if you are
> able to break your table down to smaller buckets, you might as well be able
> to use a bucketed map join for instance. Following link should be
> helpful[1][2].
>
> Hope this helps.
>
> [1]
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization
> [2]
> http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables
>
>
> On Fri, May 30, 2014 at 5:38 PM, <shouvanik.hal...@accenture.com> wrote:
>
>>  Pls find the answers
>>
>>
>>
>>
>>
>>
>>
>> *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
>> *Sent:* Friday, May 30, 2014 3:34 PM
>>
>> *To:* user@hive.apache.org
>> *Subject:* Re: Need urgent help on hive query performance
>>
>>
>>
>> I feel it's pretty hard to answer this without understanding the
>> following:
>>
>>
>>
>> 1.      What exactly are you trying to query? CSV? Avro? ....
>>
>> HIVE table
>>
>> 2.      Where is your data? HDFS? HBase? Local filesystem?
>>
>> Data is in s3
>>
>> 3.      What version of hive are you using?
>>
>> Hive 0.12
>>
>> 4.      What is an example of a query that is slow? Some queries like
>> joins and stuff would be inherently slower than other simpler ones(though
>> can be optimized).
>>
>> It has innumerable no of joins. Since its client specific query, u
>> understand I cannot share. Sorry about that
>>
>>
>>
>> Thanks,
>>
>>
>>
>> --
>> Swarnim
>>
>>
>>
>> On Fri, May 30, 2014 at 5:32 PM, <shouvanik.hal...@accenture.com> wrote:
>>
>> Can you please give a specific example or blog to refer to. I did not
>> understand
>>
>>
>>
>> *From:* Ashish Garg [mailto:gargcreation1...@gmail.com]
>> *Sent:* Friday, May 30, 2014 3:31 PM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Need urgent help on hive query performance
>>
>>
>>
>> try partitioning the table and run the queries which are partition
>> specific. Hope this helps.
>>
>> Thanks and Regards,
>>
>> Ashish Garg.
>>
>>
>>
>> On Fri, May 30, 2014 at 6:05 PM, <shouvanik.hal...@accenture.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Does anybody  help urgently on optimizing hive query performance? I am
>> looking more Hadoop tuning point of view. Currently, small amount of table
>> takes much time to query?
>>
>>
>>
>> We are running EMR cluster with 1 MASTER node, 2 Core Nodes and  Task
>> Nodes.
>>
>>
>>
>> Quick help is much appreciated.
>>
>>
>>
>> Thanks,
>>
>> Shouvanik
>>
>>
>>  ------------------------------
>>
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Where allowed
>> by local law, electronic communications with Accenture and its affiliates,
>> including e-mail and instant messaging (including content), may be scanned
>> by our systems for the purposes of information security and assessment of
>> internal compliance with Accenture policy.
>>
>> ______________________________________________________________________________________
>>
>> www.accenture.com
>>
>>
>>
>>
>>
>>
>>
>> --
>> Swarnim
>>
>
>
>
> --
> Swarnim
>

Reply via email to