Well, since the DistributedCache is used by the tasktracker, you need to update the log4j configuration file used by the tasktracker daemon. And you need to get the tasktracker log file - from the machine where you see the distributed cache problem.
On Fri, Apr 19, 2013 at 6:27 AM, <xia_y...@dell.com> wrote: > Hi Hemanth,**** > > ** ** > > I tried http://machine:50030. It did not work for me.**** > > ** ** > > In hbase_home/conf folder, I update the log4j configuration properties and > got attached log. Do you find what is happening for the map reduce job?*** > * > > ** ** > > Thanks,**** > > ** ** > > Jane**** > > ** ** > > *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com] > *Sent:* Wednesday, April 17, 2013 9:11 PM > > *To:* user@hadoop.apache.org > *Subject:* Re: How to configure mapreduce archive size?**** > > ** ** > > The check for cache file cleanup is controlled by the > property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to > 1 minute (which should be sufficient for your requirement).**** > > ** ** > > I am not sure why the JobTracker UI is inaccessible. If you know where JT > is running, try hitting http://machine:50030. If that doesn't work, maybe > check if ports have been changed in mapred-site.xml for a property similar > to mapred.job.tracker.http.address. **** > > ** ** > > There is logging in the code of the tasktracker component that can help > debug the distributed cache behaviour. In order to get those logs you need > to enable debug logging in the log4j configuration properties and restart > the daemons. Hopefully that will help you get some hints on what is > happening.**** > > ** ** > > Thanks**** > > hemanth**** > > ** ** > > On Wed, Apr 17, 2013 at 11:49 PM, <xia_y...@dell.com> wrote:**** > > Hi Hemanth and Bejoy KS,**** > > **** > > I have tried both mapred-site.xml and core-site.xml. They do not work. I > set the value to 50K just for testing purpose, however the folder size > already goes to 900M now. As in your email, “After they are done, the > property will help cleanup the files due to the limit set. ” How frequently > the cleanup task will be triggered? **** > > **** > > Regarding the job.xml, I cannot use JT web UI to find it. It seems when > hadoop is packaged within Hbase, this is disabled. I am only use Hbase > jobs. I was suggested by Hbase people to get help from Hadoop mailing list. > I will contact them again.**** > > **** > > Thanks,**** > > **** > > Jane**** > > **** > > *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com] > *Sent:* Tuesday, April 16, 2013 9:35 PM**** > > > *To:* user@hadoop.apache.org > *Subject:* Re: How to configure mapreduce archive size?**** > > **** > > You can limit the size by setting local.cache.size in the mapred-site.xml > (or core-site.xml if that works for you). I mistakenly mentioned > mapred-default.xml in my last mail - apologies for that. However, please > note that this does not prevent whatever is writing into the distributed > cache from creating those files when they are required. After they are > done, the property will help cleanup the files due to the limit set. **** > > **** > > That's why I am more keen on finding what is using the files in the > Distributed cache. It may be useful if you can ask on the HBase list as > well if the APIs you are using are creating the files you mention (assuming > you are only running HBase jobs on the cluster and nothing else)**** > > **** > > Thanks**** > > Hemanth**** > > **** > > On Tue, Apr 16, 2013 at 11:15 PM, <xia_y...@dell.com> wrote:**** > > Hi Hemanth,**** > > **** > > I did not explicitly using DistributedCache in my code. I did not use any > command line arguments like –libjars neither.**** > > **** > > Where can I find job.xml? I am using Hbase MapReduce API and not setting > any job.xml.**** > > **** > > The key point is I want to limit the size of > /tmp/hadoop-root/mapred/local/archive. Could you help?**** > > **** > > Thanks.**** > > **** > > Xia**** > > **** > > *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com] > *Sent:* Thursday, April 11, 2013 9:09 PM**** > > > *To:* user@hadoop.apache.org > *Subject:* Re: How to configure mapreduce archive size?**** > > **** > > TableMapReduceUtil has APIs like addDependencyJars which will use > DistributedCache. I don't think you are explicitly using that. Are you > using any command line arguments like -libjars etc when you are launching > the MapReduce job ? Alternatively you can check job.xml of the launched MR > job to see if it has set properties having prefixes like mapred.cache. If > nothing's set there, it would seem like some other process or user is > adding jars to DistributedCache when using the cluster.**** > > **** > > Thanks**** > > hemanth**** > > **** > > **** > > **** > > On Thu, Apr 11, 2013 at 11:40 PM, <xia_y...@dell.com> wrote:**** > > Hi Hemanth,**** > > **** > > Attached is some sample folders within my > /tmp/hadoop-root/mapred/local/archive. There are some jar and class files > inside.**** > > **** > > My application uses MapReduce job to do purge Hbase old data. I am using > basic HBase MapReduce API to delete rows from Hbase table. I do not specify > to use Distributed cache. Maybe HBase use it?**** > > **** > > Some code here:**** > > **** > > Scan scan = *new* Scan();**** > > scan.setCaching(500); // 1 is the default in Scan, which > will be bad for MapReduce jobs**** > > scan.setCacheBlocks(*false*); // don't set to true for MR jobs**** > > scan.setTimeRange(Long.*MIN_VALUE*, timestamp);**** > > // set other scan *attrs***** > > // the purge start time**** > > Date date=*new* Date();**** > > TableMapReduceUtil.*initTableMapperJob*(**** > > tableName, // input table**** > > scan, // Scan instance to control CF and > attribute selection**** > > MapperDelete.*class*, // *mapper* class**** > > *null*, // *mapper* output key**** > > *null*, // *mapper* output value**** > > job);**** > > **** > > job.setOutputFormatClass(TableOutputFormat.*class*);**** > > job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*, > tableName);**** > > job.setNumReduceTasks(0);**** > > **** > > *boolean* b = job.waitForCompletion(*true*);**** > > **** > > *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com] > *Sent:* Thursday, April 11, 2013 12:29 AM**** > > > *To:* user@hadoop.apache.org > *Subject:* Re: How to configure mapreduce archive size?**** > > **** > > Could you paste the contents of the directory ? Not sure whether that will > help, but just giving it a shot.**** > > **** > > What application are you using ? Is it custom MapReduce jobs in which you > use Distributed cache (I guess not) ? **** > > **** > > Thanks**** > > Hemanth**** > > **** > > On Thu, Apr 11, 2013 at 3:34 AM, <xia_y...@dell.com> wrote:**** > > Hi Arun,**** > > **** > > I stopped my application, then restarted my hbase (which include hadoop). > After that I start my application. After one evening, my > /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not > work.**** > > **** > > Is this the right place to change the value?**** > > **** > > "local.cache.size" in file core-default.xml, which is in > hadoop-core-1.0.3.jar**** > > **** > > Thanks,**** > > **** > > Jane**** > > **** > > *From:* Arun C Murthy [mailto:a...@hortonworks.com] > *Sent:* Wednesday, April 10, 2013 2:45 PM**** > > > *To:* user@hadoop.apache.org > *Subject:* Re: How to configure mapreduce archive size?**** > > **** > > Ensure no jobs are running (cache limit is only for non-active cache > files), check after a little while (takes sometime for the cleaner thread > to kick in).**** > > **** > > Arun**** > > **** > > On Apr 11, 2013, at 2:29 AM, <xia_y...@dell.com> <xia_y...@dell.com> > wrote:**** > > **** > > Hi Hemanth,**** > > **** > > For the hadoop 1.0.3, I can only find "local.cache.size" in file > core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in > mapred-default.xml.**** > > **** > > I updated the value in file default.xml and changed the value to 500000. > This is just for my testing purpose. However, the folder > /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks > like it does not do the work. Could you advise if what I did is correct?** > ** > > **** > > <name>local.cache.size</name>**** > > <value>500000</value>**** > > **** > > Thanks,**** > > **** > > Xia**** > > **** > > *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com] > *Sent:* Monday, April 08, 2013 9:09 PM > *To:* user@hadoop.apache.org > *Subject:* Re: How to configure mapreduce archive size?**** > > **** > > Hi,**** > > **** > > This directory is used as part of the 'DistributedCache' feature. ( > http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). > There is a configuration key "local.cache.size" which controls the amount > of data stored under DistributedCache. The default limit is 10GB. However, > the files under this cannot be deleted if they are being used. Also, some > frameworks on Hadoop could be using DistributedCache transparently to you. > **** > > **** > > So you could check what is being stored here and based on that lower the > limit of the cache size if you feel that will help. The property needs to > be set in mapred-default.xml.**** > > **** > > Thanks**** > > Hemanth**** > > **** > > On Mon, Apr 8, 2013 at 11:09 PM, <xia_y...@dell.com> wrote:**** > > Hi,**** > > **** > > I am using hadoop which is packaged within hbase -0.94.1. It is hadoop > 1.0.3. There is some mapreduce job running on my server. After some time, I > found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.** > ** > > **** > > How to configure this and limit the size? I do not want to waste my space > for archive.**** > > **** > > Thanks,**** > > **** > > Xia**** > > **** > > **** > > **** > > --**** > > Arun C. Murthy**** > > Hortonworks Inc. > http://hortonworks.com/**** > > **** > > **** > > **** > > **** > > ** ** >