We've noticed that our Hive jobs appear to be getting slower and slower every day even though the data set isn't really growing by much. Here are some run times taken from last month which shows the date and the duration of the job in minutes:

2010/12/31 -> 19.2166666666667
2011/01/31 -> 24.55
2011/02/28 -> 44.6166666666667
2011/03/31 -> 49.9833333333333
2011/04/30 -> 55.3833333333333

The only thing that stands out is that we're not deleting older partitions, so there are probably about two years worth of partitions in the system. The jobs only use the partition for the current month, but I'm not sure if having the other partitions can somehow slow things down regardless of
them not being used.

Any advise and suggestions are welcome.

thanks,
M

Reply via email to