Thanks a lot Alexander! What is mapreduce.jobtracker.retiredjobs.cache.size for? Does cron approach safe for hadoop? Is that only way at the moment?
On Wed, Jan 9, 2013 at 6:50 PM, Alexander Alten-Lorenz <wget.n...@gmail.com>wrote: > Hi, > > Per default (and not configurable) the logs will be persist for 30 days. > This will be configurable in future ( > https://issues.apache.org/jira/browse/MAPREDUCE-4643). > > - Alex > > On Jan 9, 2013, at 3:41 PM, Ivan Tretyakov <itretya...@griddynamics.com> > wrote: > > > Hello! > > > > I've found that jobcache directory became very large on our cluster, > e.g.: > > > > # du -sh /data?/mapred/local/taskTracker/user/jobcache > > 465G /data1/mapred/local/taskTracker/user/jobcache > > 464G /data2/mapred/local/taskTracker/user/jobcache > > 454G /data3/mapred/local/taskTracker/user/jobcache > > > > And it stores information for about 100 jobs: > > > > # ls -1 /data?/mapred/local/taskTracker/persona/jobcache/ | sort | uniq > | > > wc -l > > > > I've found that there is following parameter: > > > > <property> > > <name>mapreduce.jobtracker.retiredjobs.cache.size</name> > > <value>1000</value> > > <description>The number of retired job status to keep in the cache. > > </description> > > </property> > > > > So, if I got it right it intended to control job cache size by limiting > > number of jobs to store cache for. > > > > Also, I've seen that some hadoop users uses cron approach to cleanup > > jobcache: > > > http://grokbase.com/t/hadoop/common-user/102ax9bze1/cleaning-jobcache-manually > > ( > > > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201002.mbox/%3c99484d561002100143s4404df98qead8f2cf687a7...@mail.gmail.com%3E > > ) > > > > Are there other approaches to control jobcache size? > > What is more correct way to do it? > > > > Thanks in advance! > > > > P.S. We are using CDH 4.1.1. > > > > -- > > Best Regards > > Ivan Tretyakov > > > > Deployment Engineer > > Grid Dynamics > > +7 812 640 38 76 > > Skype: ivan.tretyakov > > www.griddynamics.com > > itretya...@griddynamics.com > > -- > Alexander Alten-Lorenz > http://mapredit.blogspot.com > German Hadoop LinkedIn Group: http://goo.gl/N8pCF > > -- Best Regards Ivan Tretyakov Deployment Engineer Grid Dynamics +7 812 640 38 76 Skype: ivan.tretyakov www.griddynamics.com itretya...@griddynamics.com