Another dimension, Try storing Hive table in ORC format. From my experience, it significantly improves the performance compare to other formats.
Since you mentioned about join queries, on a side note, as a long term goal, you probably want to explore Hive with Tez. --Bala G. On Fri, May 30, 2014 at 3:59 PM, kulkarni.swar...@gmail.com < kulkarni.swar...@gmail.com> wrote: > > It has innumerable no of joins. Since its client specific query, u > understand I cannot share. Sorry about that > > Like I said, Joins are slow and in not done correctly could have terrible > performance. A couple of handy techniques depend on how exactly are you > trying to perform the join. For instance, if you are trying to join a > smaller table to a larger one, a map join could work well for you where the > smaller table is kept in-memory when the join is performed. Also if you are > able to break your table down to smaller buckets, you might as well be able > to use a bucketed map join for instance. Following link should be > helpful[1][2]. > > Hope this helps. > > [1] > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization > [2] > http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables > > > On Fri, May 30, 2014 at 5:38 PM, <shouvanik.hal...@accenture.com> wrote: > >> Pls find the answers >> >> >> >> >> >> >> >> *From:* kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com] >> *Sent:* Friday, May 30, 2014 3:34 PM >> >> *To:* user@hive.apache.org >> *Subject:* Re: Need urgent help on hive query performance >> >> >> >> I feel it's pretty hard to answer this without understanding the >> following: >> >> >> >> 1. What exactly are you trying to query? CSV? Avro? .... >> >> HIVE table >> >> 2. Where is your data? HDFS? HBase? Local filesystem? >> >> Data is in s3 >> >> 3. What version of hive are you using? >> >> Hive 0.12 >> >> 4. What is an example of a query that is slow? Some queries like >> joins and stuff would be inherently slower than other simpler ones(though >> can be optimized). >> >> It has innumerable no of joins. Since its client specific query, u >> understand I cannot share. Sorry about that >> >> >> >> Thanks, >> >> >> >> -- >> Swarnim >> >> >> >> On Fri, May 30, 2014 at 5:32 PM, <shouvanik.hal...@accenture.com> wrote: >> >> Can you please give a specific example or blog to refer to. I did not >> understand >> >> >> >> *From:* Ashish Garg [mailto:gargcreation1...@gmail.com] >> *Sent:* Friday, May 30, 2014 3:31 PM >> *To:* user@hive.apache.org >> *Subject:* Re: Need urgent help on hive query performance >> >> >> >> try partitioning the table and run the queries which are partition >> specific. Hope this helps. >> >> Thanks and Regards, >> >> Ashish Garg. >> >> >> >> On Fri, May 30, 2014 at 6:05 PM, <shouvanik.hal...@accenture.com> wrote: >> >> Hi, >> >> >> >> Does anybody help urgently on optimizing hive query performance? I am >> looking more Hadoop tuning point of view. Currently, small amount of table >> takes much time to query? >> >> >> >> We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task >> Nodes. >> >> >> >> Quick help is much appreciated. >> >> >> >> Thanks, >> >> Shouvanik >> >> >> ------------------------------ >> >> >> This message is for the designated recipient only and may contain >> privileged, proprietary, or otherwise confidential information. If you have >> received it in error, please notify the sender immediately and delete the >> original. Any other use of the e-mail by you is prohibited. Where allowed >> by local law, electronic communications with Accenture and its affiliates, >> including e-mail and instant messaging (including content), may be scanned >> by our systems for the purposes of information security and assessment of >> internal compliance with Accenture policy. >> >> ______________________________________________________________________________________ >> >> www.accenture.com >> >> >> >> >> >> >> >> -- >> Swarnim >> > > > > -- > Swarnim >