RE: How to configure mapreduce archive size?

Xia_Yang Tue, 16 Apr 2013 10:46:32 -0700

Hi Hemanth,

I did not explicitly using DistributedCache in my code. I did not use any 
command line arguments like -libjars neither.


Where can I find job.xml? I am using Hbase MapReduce API and not setting any 
job.xml.

The key point is I want to limit the size of 
/tmp/hadoop-root/mapred/local/archive. Could you help?

Thanks.

Xia

From: Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
Sent: Thursday, April 11, 2013 9:09 PM
To: user@hadoop.apache.org
Subject: Re: How to configure mapreduce archive size?

TableMapReduceUtil has APIs like addDependencyJars which will use 
DistributedCache. I don't think you are explicitly using that. Are you using 
any command line arguments like -libjars etc when you are launching the 
MapReduce job ? Alternatively you can check job.xml of the launched MR job to 
see if it has set properties having prefixes like mapred.cache. If nothing's 
set there, it would seem like some other process or user is adding jars to 
DistributedCache when using the cluster.

Thanks
hemanth



On Thu, Apr 11, 2013 at 11:40 PM, <xia_y...@dell.com<mailto:xia_y...@dell.com>> 
wrote:
Hi Hemanth,

Attached is some sample folders within my 
/tmp/hadoop-root/mapred/local/archive. There are some jar and class files 
inside.

My application uses MapReduce job to do purge Hbase old data. I am using basic 
HBase MapReduce API to delete rows from Hbase table. I do not specify to use 
Distributed cache. Maybe HBase use it?

Some code here:

       Scan scan = new Scan();
       scan.setCaching(500);        // 1 is the default in Scan, which will be 
bad for MapReduce jobs
       scan.setCacheBlocks(false);  // don't set to true for MR jobs
       scan.setTimeRange(Long.MIN_VALUE, timestamp);
       // set other scan attrs
       // the purge start time
       Date date=new Date();
       TableMapReduceUtil.initTableMapperJob(
             tableName,        // input table
             scan,               // Scan instance to control CF and attribute 
selection
             MapperDelete.class,     // mapper class
             null,         // mapper output key
             null,  // mapper output value
             job);

       job.setOutputFormatClass(TableOutputFormat.class);
       job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
       job.setNumReduceTasks(0);

       boolean b = job.waitForCompletion(true);

From: Hemanth Yamijala 
[mailto:yhema...@thoughtworks.com<mailto:yhema...@thoughtworks.com>]
Sent: Thursday, April 11, 2013 12:29 AM

To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Could you paste the contents of the directory ? Not sure whether that will 
help, but just giving it a shot.

What application are you using ? Is it custom MapReduce jobs in which you use 
Distributed cache (I guess not) ?

Thanks
Hemanth

On Thu, Apr 11, 2013 at 3:34 AM, <xia_y...@dell.com<mailto:xia_y...@dell.com>> 
wrote:
Hi Arun,

I stopped my application, then restarted my hbase (which include hadoop). After 
that I start my application. After one evening, my 
/tmp/hadoop-root/mapred/local/archive goes to more than 1G. It does not work.

Is this the right place to change the value?

"local.cache.size" in file core-default.xml, which is in hadoop-core-1.0.3.jar

Thanks,

Jane

From: Arun C Murthy [mailto:a...@hortonworks.com<mailto:a...@hortonworks.com>]
Sent: Wednesday, April 10, 2013 2:45 PM

To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Ensure no jobs are running (cache limit is only for non-active cache files), 
check after a little while (takes sometime for the cleaner thread to kick in).

Arun

On Apr 11, 2013, at 2:29 AM, <xia_y...@dell.com<mailto:xia_y...@dell.com>> 
<xia_y...@dell.com<mailto:xia_y...@dell.com>> wrote:

Hi Hemanth,

For the hadoop 1.0.3, I can only find "local.cache.size" in file 
core-default.xml, which is in hadoop-core-1.0.3.jar. It is not in 
mapred-default.xml.

I updated the value in file default.xml and changed the value to 500000. This 
is just for my testing purpose. However, the folder 
/tmp/hadoop-root/mapred/local/archive already goes more than 1G now. Looks like 
it does not do the work. Could you advise if what I did is correct?

  <name>local.cache.size</name>
  <value>500000</value>

Thanks,

Xia

From: Hemanth Yamijala 
[mailto:yhema...@thoughtworks.com<mailto:yhema...@thoughtworks.com>]
Sent: Monday, April 08, 2013 9:09 PM
To: user@hadoop.apache.org<mailto:user@hadoop.apache.org>
Subject: Re: How to configure mapreduce archive size?

Hi,

This directory is used as part of the 'DistributedCache' feature. 
(http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache). 
There is a configuration key "local.cache.size" which controls the amount of 
data stored under DistributedCache. The default limit is 10GB. However, the 
files under this cannot be deleted if they are being used. Also, some 
frameworks on Hadoop could be using DistributedCache transparently to you.

So you could check what is being stored here and based on that lower the limit 
of the cache size if you feel that will help. The property needs to be set in 
mapred-default.xml.

Thanks
Hemanth

On Mon, Apr 8, 2013 at 11:09 PM, <xia_y...@dell.com<mailto:xia_y...@dell.com>> 
wrote:
Hi,

I am using hadoop which is packaged within hbase -0.94.1. It is hadoop 1.0.3. 
There is some mapreduce job running on my server. After some time, I found that 
my folder /tmp/hadoop-root/mapred/local/archive has 14G size.

How to configure this and limit the size? I do not want  to waste my space for 
archive.

Thanks,

Xia



--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

RE: How to configure mapreduce archive size?

Reply via email to