hi all,

We are using hive for Ad-hoc querying and have a hive table which is
partitioned on two fields (date,id).Now for each date there are around 1400
ids so on a single day around that many partitions are added.The actual
data is residing in s3. now the issue we are facing is suppose we do a
select count(*) for a month from the table then it takes quite a long
amount of time(approx : 1hrs 52 min) just to launch the map reduce job.
when i ran the query in hive verbose mode i can see that its spending this
time actually deciding how many number of mappers to spawn(calculating
splits). Is there any means by which i can reduce this lag time for the
launch of map-reduce job.

this is one of the log messages that is being logged during this lag time

13/11/19 07:11:06 INFO mapred.FileInputFormat: Total input paths to process
: 1
13/11/19 07:11:06 WARN httpclient.RestS3Service: Response
'/Analyze%2F2013%2F10%2F03%2F465' - Unexpected response code 404, expected
200
Anyone has a quick fix for this ?

-- 
Sreenath S Kamath
Bangalore
Ph No:+91-9590989106

Reply via email to