Well, since the DistributedCache is used by the tasktracker, you need to
update the log4j configuration file used by the tasktracker daemon. And you
need to get the tasktracker log file - from the machine where you see the
distributed cache problem.


On Fri, Apr 19, 2013 at 6:27 AM, <xia_y...@dell.com> wrote:

> Hi Hemanth,****
>
> ** **
>
> I tried http://machine:50030. It did not work for me.****
>
> ** **
>
> In hbase_home/conf folder, I update the log4j configuration properties and
> got attached log. Do you find what is happening for the map reduce job?***
> *
>
> ** **
>
> Thanks,****
>
> ** **
>
> Jane****
>
> ** **
>
> *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
> *Sent:* Wednesday, April 17, 2013 9:11 PM
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
> ** **
>
> The check for cache file cleanup is controlled by the
> property mapreduce.tasktracker.distributedcache.checkperiod. It defaults to
> 1 minute (which should be sufficient for your requirement).****
>
> ** **
>
> I am not sure why the JobTracker UI is inaccessible. If you know where JT
> is running, try hitting http://machine:50030. If that doesn't work, maybe
> check if ports have been changed in mapred-site.xml for a property similar
> to mapred.job.tracker.http.address. ****
>
> ** **
>
> There is logging in the code of the tasktracker component that can help
> debug the distributed cache behaviour. In order to get those logs you need
> to enable debug logging in the log4j configuration properties and restart
> the daemons. Hopefully that will help you get some hints on what is
> happening.****
>
> ** **
>
> Thanks****
>
> hemanth****
>
> ** **
>
> On Wed, Apr 17, 2013 at 11:49 PM, <xia_y...@dell.com> wrote:****
>
> Hi Hemanth and Bejoy KS,****
>
>  ****
>
> I have tried both mapred-site.xml and core-site.xml. They do not work. I
> set the value to 50K just for testing purpose, however the folder size
> already goes to 900M now. As in your email, “After they are done, the
> property will help cleanup the files due to the limit set. ” How frequently
> the cleanup task will be triggered? ****
>
>  ****
>
> Regarding the job.xml, I cannot use JT web UI to find it. It seems when
> hadoop is packaged within Hbase, this is disabled. I am only use Hbase
> jobs. I was suggested by Hbase people to get help from Hadoop mailing list.
> I will contact them again.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
> *Sent:* Tuesday, April 16, 2013 9:35 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> You can limit the size by setting local.cache.size in the mapred-site.xml
> (or core-site.xml if that works for you). I mistakenly mentioned
> mapred-default.xml in my last mail - apologies for that. However, please
> note that this does not prevent whatever is writing into the distributed
> cache from creating those files when they are required. After they are
> done, the property will help cleanup the files due to the limit set. ****
>
>  ****
>
> That's why I am more keen on finding what is using the files in the
> Distributed cache. It may be useful if you can ask on the HBase list as
> well if the APIs you are using are creating the files you mention (assuming
> you are only running HBase jobs on the cluster and nothing else)****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Tue, Apr 16, 2013 at 11:15 PM, <xia_y...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> I did not explicitly using DistributedCache in my code. I did not use any
> command line arguments like –libjars neither.****
>
>  ****
>
> Where can I find job.xml? I am using Hbase MapReduce API and not setting
> any job.xml.****
>
>  ****
>
> The key point is I want to limit the size of
> /tmp/hadoop-root/mapred/local/archive. Could you help?****
>
>  ****
>
> Thanks.****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 9:09 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> TableMapReduceUtil has APIs like addDependencyJars which will use
> DistributedCache. I don't think you are explicitly using that. Are you
> using any command line arguments like -libjars etc when you are launching
> the MapReduce job ? Alternatively you can check job.xml of the launched MR
> job to see if it has set properties having prefixes like mapred.cache. If
> nothing's set there, it would seem like some other process or user is
> adding jars to DistributedCache when using the cluster.****
>
>  ****
>
> Thanks****
>
> hemanth****
>
>  ****
>
>  ****
>
>  ****
>
> On Thu, Apr 11, 2013 at 11:40 PM, <xia_y...@dell.com> wrote:****
>
> Hi Hemanth,****
>
>  ****
>
> Attached is some sample folders within my
> /tmp/hadoop-root/mapred/local/archive. There are some jar and class files
> inside.****
>
>  ****
>
> My application uses MapReduce job to do purge Hbase old data. I am using
> basic HBase MapReduce API to delete rows from Hbase table. I do not specify
> to use Distributed cache. Maybe HBase use it?****
>
>  ****
>
> Some code here:****
>
>  ****
>
>        Scan scan = *new* Scan();****
>
>        scan.setCaching(500);        // 1 is the default in Scan, which
> will be bad for MapReduce jobs****
>
>        scan.setCacheBlocks(*false*);  // don't set to true for MR jobs****
>
>        scan.setTimeRange(Long.*MIN_VALUE*, timestamp);****
>
>        // set other scan *attrs*****
>
>        // the purge start time****
>
>        Date date=*new* Date();****
>
>        TableMapReduceUtil.*initTableMapperJob*(****
>
>              tableName,        // input table****
>
>              scan,               // Scan instance to control CF and
> attribute selection****
>
>              MapperDelete.*class*,     // *mapper* class****
>
>              *null*,         // *mapper* output key****
>
>              *null*,  // *mapper* output value****
>
>              job);****
>
>  ****
>
>        job.setOutputFormatClass(TableOutputFormat.*class*);****
>
>        job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
> tableName);****
>
>        job.setNumReduceTasks(0);****
>
>        ****
>
>        *boolean* b = job.waitForCompletion(*true*);****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
> *Sent:* Thursday, April 11, 2013 12:29 AM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Could you paste the contents of the directory ? Not sure whether that will
> help, but just giving it a shot.****
>
>  ****
>
> What application are you using ? Is it custom MapReduce jobs in which you
> use Distributed cache (I guess not) ? ****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Thu, Apr 11, 2013 at 3:34 AM, <xia_y...@dell.com> wrote:****
>
> Hi Arun,****
>
>  ****
>
> I stopped my application, then restarted my hbase (which include hadoop).
> After that I start my application. After one evening, my
> /tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not
> work.****
>
>  ****
>
> Is this the right place to change the value?****
>
>  ****
>
> "local.cache.size" in file core-default.xml, which is in
> hadoop-core-1.0.3.jar****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Jane****
>
>  ****
>
> *From:* Arun C Murthy [mailto:a...@hortonworks.com]
> *Sent:* Wednesday, April 10, 2013 2:45 PM****
>
>
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Ensure no jobs are running (cache limit is only for non-active cache
> files), check after a little while (takes sometime for the cleaner thread
> to kick in).****
>
>  ****
>
> Arun****
>
>  ****
>
> On Apr 11, 2013, at 2:29 AM, <xia_y...@dell.com> <xia_y...@dell.com>
> wrote:****
>
>  ****
>
> Hi Hemanth,****
>
>  ****
>
> For the hadoop 1.0.3, I can only find "local.cache.size" in file
> core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in
> mapred-default.xml.****
>
>  ****
>
> I updated the value in file default.xml and changed the value to 500000.
> This is just for my testing purpose. However, the folder
> /tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks
> like it does not do the work. Could you advise if what I did is correct?**
> **
>
>  ****
>
>   <name>local.cache.size</name>****
>
>   <value>500000</value>****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
> *From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
> *Sent:* Monday, April 08, 2013 9:09 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: How to configure mapreduce archive size?****
>
>  ****
>
> Hi,****
>
>  ****
>
> This directory is used as part of the 'DistributedCache' feature. (
> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
> There is a configuration key "local.cache.size" which controls the amount
> of data stored under DistributedCache. The default limit is 10GB. However,
> the files under this cannot be deleted if they are being used. Also, some
> frameworks on Hadoop could be using DistributedCache transparently to you.
> ****
>
>  ****
>
> So you could check what is being stored here and based on that lower the
> limit of the cache size if you feel that will help. The property needs to
> be set in mapred-default.xml.****
>
>  ****
>
> Thanks****
>
> Hemanth****
>
>  ****
>
> On Mon, Apr 8, 2013 at 11:09 PM, <xia_y...@dell.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am using hadoop which is packaged within hbase -0.94.1. It is hadoop
> 1.0.3. There is some mapreduce job running on my server. After some time, I
> found that my folder /tmp/hadoop-root/mapred/local/archive has 14G size.**
> **
>
>  ****
>
> How to configure this and limit the size? I do not want  to waste my space
> for archive.****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Xia****
>
>  ****
>
>  ****
>
>  ****
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
> ** **
>

Reply via email to