Hi, On Thu, Jan 10, 2013 at 5:17 PM, Ivan Tretyakov <itretya...@griddynamics.com > wrote:
> Thanks for replies! > > Hemanth, > I could see following exception in TaskTracker log: > https://issues.apache.org/jira/browse/MAPREDUCE-5 > But I'm not sure if it is related to this issue. > > > Now, when a job completes, the directories under the jobCache must get > automatically cleaned up. However it doesn't look like this is happening in > your case. > > So, If I've no running jobs, jobcache directory should be empty. Is it > correct? > > That is correct. I just verified it with my Hadoop 1.0.2 version Thanks Hemanth > > > On Thu, Jan 10, 2013 at 8:18 AM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> Hi, >> >> The directory name you have provided is >> /data?/mapred/local/taskTracker/persona/jobcache/. >> This directory is used by the TaskTracker (slave) daemons to localize job >> files when the tasks are run on the slaves. >> >> Hence, I don't think this is related to the parameter " >> mapreduce.jobtracker.retiredjobs.cache.size", which is a parameter >> related to the jobtracker process. >> >> Now, when a job completes, the directories under the jobCache must get >> automatically cleaned up. However it doesn't look like this is happening in >> your case. >> >> Could you please look at the logs of the tasktracker machine where this >> has gotten filled up to see if there are any errors that could give clues ? >> Also, since this is a CDH release, it could be a problem specific to that >> - and maybe reaching out on the CDH mailing lists will also help >> >> Thanks >> hemanth >> >> On Wed, Jan 9, 2013 at 8:11 PM, Ivan Tretyakov < >> itretya...@griddynamics.com> wrote: >> >>> Hello! >>> >>> I've found that jobcache directory became very large on our cluster, >>> e.g.: >>> >>> # du -sh /data?/mapred/local/taskTracker/user/jobcache >>> 465G /data1/mapred/local/taskTracker/user/jobcache >>> 464G /data2/mapred/local/taskTracker/user/jobcache >>> 454G /data3/mapred/local/taskTracker/user/jobcache >>> >>> And it stores information for about 100 jobs: >>> >>> # ls -1 /data?/mapred/local/taskTracker/persona/jobcache/ | sort | uniq >>> | wc -l >>> >>> I've found that there is following parameter: >>> >>> <property> >>> <name>mapreduce.jobtracker.retiredjobs.cache.size</name> >>> <value>1000</value> >>> <description>The number of retired job status to keep in the cache. >>> </description> >>> </property> >>> >>> So, if I got it right it intended to control job cache size by limiting >>> number of jobs to store cache for. >>> >>> Also, I've seen that some hadoop users uses cron approach to cleanup >>> jobcache: >>> http://grokbase.com/t/hadoop/common-user/102ax9bze1/cleaning-jobcache-manually >>> ( >>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201002.mbox/%3c99484d561002100143s4404df98qead8f2cf687a7...@mail.gmail.com%3E >>> ) >>> >>> Are there other approaches to control jobcache size? >>> What is more correct way to do it? >>> >>> Thanks in advance! >>> >>> P.S. We are using CDH 4.1.1. >>> >>> -- >>> Best Regards >>> Ivan Tretyakov >>> >>> Deployment Engineer >>> Grid Dynamics >>> +7 812 640 38 76 >>> Skype: ivan.tretyakov >>> www.griddynamics.com >>> itretya...@griddynamics.com >>> >> >> > > > -- > Best Regards > Ivan Tretyakov > > Deployment Engineer > Grid Dynamics > +7 812 640 38 76 > Skype: ivan.tretyakov > www.griddynamics.com > itretya...@griddynamics.com >