Actually, on 2nd though, even listing directories (which is necessary to launch the job) could take long. If there are any client logs, you can try to take a look to see where the time is spent. If you are running under Hive CLI, the logs would be in /tmp/$USER/hive.log by default.
On 15/9/18, 11:46, "Sergey Shelukhin" <ser...@hortonworks.com> wrote: >Which version of the Hive, and file format, are you using? >It could be either reading file footers for ORC - in recent version >there’s way to disable that (set hive.exec.orc.split.strategy=BI); or >some similar feature for other formats that I’m not immediately familiar >with. >It could also be slow metastore calls. > >From: Sreenath <sreenaths1...@gmail.com<mailto:sreenaths1...@gmail.com>> >Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" ><user@hive.apache.org<mailto:user@hive.apache.org>> >Date: Friday, September 18, 2015 at 02:24 >To: "d...@hive.apache.org<mailto:d...@hive.apache.org>" ><d...@hive.apache.org<mailto:d...@hive.apache.org>>, >"user@hive.apache.org<mailto:user@hive.apache.org>" ><user@hive.apache.org<mailto:user@hive.apache.org>> >Subject: Hive Start Up Time Manifolds Greater than Execution Time > >Hi All, > >Something interesting fell to my notice last day when i was using hive >for some queries. The time taken by hive to launch a mapreduce job was >manifolds higher than the time taken by hadoop to actually execute it. >This is the table details on which the query is being fired. > >CREATE EXTERNAL TABLE A >( > user_id string, > stage strig, > url string >) >PARTITIONED BY (dt string , id string) > >All the data for table is stored in S3 and each day there will be around >2000 unique id i.e 2000 partitions being added daily. And we can assume >that each partition has on a average 100MB gzip compressed data. >Now when I run a query like "SELECT DISTINCT user_id FROM A WHERE >dt>='20150101' and dt <= '20150401'" ie over a period of 3 months approx >60000 partitions it takes hive approximately 2 hrs to launch the map >reduce job and the launched job just finishes in 20 min. So was wondering >if someone can help me in understanding what hive is doing in this 2 hrs ? >Would really appreciate some help here . Thanks in advance !!!! > > >Best, >Sreenath >