We've noticed that our Hive jobs appear to be getting slower and slower
every day even though the data set isn't really growing by much.
Here are some run times taken from last month which shows the date and
the duration of the job in minutes:
2010/12/31 -> 19.2166666666667
2011/01/31 -> 24.55
2011/02/28 -> 44.6166666666667
2011/03/31 -> 49.9833333333333
2011/04/30 -> 55.3833333333333
The only thing that stands out is that we're not deleting older
partitions, so there are probably about two years worth of partitions in
the system.
The jobs only use the partition for the current month, but I'm not sure
if having the other partitions can somehow slow things down regardless of
them not being used.
Any advise and suggestions are welcome.
thanks,
M