Re: JobCache directory cleanup

2013-01-17 Thread Ivan Tretyakov
Thanks a lot! That was it. There was following line in our code: jobConf.setKeepTaskFilesPattern(.*); On Fri, Jan 11, 2013 at 2:20 PM, Hemanth Yamijala yhema...@thoughtworks.com wrote: Hmm. Unfortunately, there is another config variable that may be affecting this: keep.task.files.pattern

Re: JobCache directory cleanup

2013-01-11 Thread Hemanth Yamijala
Hmm. Unfortunately, there is another config variable that may be affecting this: keep.task.files.pattern This is set to .* in the job.xml file you sent. I suspect this may be causing a problem. Can you please remove this, assuming you have not set it intentionally ? Thanks Hemanth On Fri, Jan

Re: JobCache directory cleanup

2013-01-10 Thread Ivan Tretyakov
Thanks for replies! Hemanth, I could see following exception in TaskTracker log: https://issues.apache.org/jira/browse/MAPREDUCE-5 But I'm not sure if it is related to this issue. Now, when a job completes, the directories under the jobCache must get automatically cleaned up. However it doesn't

Re: JobCache directory cleanup

2013-01-10 Thread Hemanth Yamijala
Hi, On Thu, Jan 10, 2013 at 5:17 PM, Ivan Tretyakov itretya...@griddynamics.com wrote: Thanks for replies! Hemanth, I could see following exception in TaskTracker log: https://issues.apache.org/jira/browse/MAPREDUCE-5 But I'm not sure if it is related to this issue. Now, when a job

Re: JobCache directory cleanup

2013-01-10 Thread Artem Ervits
New York Presbyterian Hospital From: Hemanth Yamijala [mailto:yhema...@thoughtworks.com] Sent: Thursday, January 10, 2013 07:37 AM To: user@hadoop.apache.org user@hadoop.apache.org Subject: Re: JobCache directory cleanup Hi, On Thu, Jan 10, 2013 at 5:17 PM, Ivan Tretyakov itretya

Re: JobCache directory cleanup

2013-01-10 Thread Vinod Kumar Vavilapalli
Can you check the job configuration for these ~100 jobs? Do they have keep.failed.task.files set to true? If so, these files won't be deleted. If it doesn't, it could be a bug. Sharing your configs for these jobs will definitely help. Thanks, +Vinod On Wed, Jan 9, 2013 at 6:41 AM, Ivan

Re: JobCache directory cleanup

2013-01-10 Thread Hemanth Yamijala
Good point. Forgot that one :-) On Thu, Jan 10, 2013 at 10:53 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Can you check the job configuration for these ~100 jobs? Do they have keep.failed.task.files set to true? If so, these files won't be deleted. If it doesn't, it could

JobCache directory cleanup

2013-01-09 Thread Ivan Tretyakov
Hello! I've found that jobcache directory became very large on our cluster, e.g.: # du -sh /data?/mapred/local/taskTracker/user/jobcache 465G/data1/mapred/local/taskTracker/user/jobcache 464G/data2/mapred/local/taskTracker/user/jobcache 454G

Re: JobCache directory cleanup

2013-01-09 Thread Alexander Alten-Lorenz
Hi, Per default (and not configurable) the logs will be persist for 30 days. This will be configurable in future (https://issues.apache.org/jira/browse/MAPREDUCE-4643). - Alex On Jan 9, 2013, at 3:41 PM, Ivan Tretyakov itretya...@griddynamics.com wrote: Hello! I've found that jobcache

Re: JobCache directory cleanup

2013-01-09 Thread Robert Molina
Hi Ivan, Regarding the mapreduce.jobtracker.retiredjobs.cache.size property, the jobtracker keeps information about a number of completed jobs in memory. There's a threshold for this, which is a single day by default - as well as a certain number of jobs per user. Once these limits are hit, the

Re: JobCache directory cleanup

2013-01-09 Thread Artem Ervits
user@hadoop.apache.org Subject: Re: JobCache directory cleanup Thanks a lot Alexander! What is mapreduce.jobtracker.retiredjobs.cache.size for? Does cron approach safe for hadoop? Is that only way at the moment? On Wed, Jan 9, 2013 at 6:50 PM, Alexander Alten-Lorenz wget.n

Re: JobCache directory cleanup

2013-01-09 Thread Hemanth Yamijala
Hi, The directory name you have provided is /data?/mapred/local/taskTracker/persona/jobcache/. This directory is used by the TaskTracker (slave) daemons to localize job files when the tasks are run on the slaves. Hence, I don't think this is related to the parameter