Hi,
I am using LZO to compress my intermediate map outputs.
These are the settings:
mapred.map.output.compression.codec = com.hadoop.compression.lzo.LzoCodec
pig.tmpfilecompression.codec = lzo
But I am consistently getting the following exception (I dont get this
exception when I use "gz" a
Hi,
Does "maps_failed" counter includes Tasks that were killed due to speculative
execution ?
Same with "reduces_faile" and Killed reduce tasks.
Thanks,
-Rakesh
Hi,
I am using Hadoop 0.20.1. Recently we had a JobTracker outage because of the
following:
JobTracker tries to write a file to HDFS but it's connection to primary
datanode gets disrupted. It then subsequently enters into retry loop (that goes
on for hours).
I see the the following message i
Any ideas on how "attempt*" directories getting created directly under
"mapred.local.dir" ? Pointers to parts of the source code would help too.
Thanks,
-Rakesh
From: rkothari_...@hotmail.com
To: mapreduce-user@hadoop.apache.org
Subject: mapred.local.dir cleanup
Date: Tue, 18 Jan 2011 17:20:04
Hi,
I am seeing lots of leftover directories going back as far as 12 days in the
task trackers "mapred.local.dir". These directories are for "M/R task attempts".
How are these directories end up in "mapred.local.dir" as from my
understanding these directories should be in
"mapred.local.dir/t
Hi,
There is a gzipped file that needs to be processed by a Map-only hadoop job. If
the size of this file is more than the space reserved for non-dfs use on the
tasktracker host processing this file and if it's a non data local map task,
would this job eventually fail ? Is hadoop jobtracker sm
Hi,
Is "move" not supported in Hdfs ? I can't find any API for that. Looking at the
source code for hadoop CLI it seems like it's implementing move by copying data
from src to dest and deleting the src. This could be a time consuming operation.
Thanks,
-Rakesh
I am using Hadoop 0.20.1.
-Rakesh
From: rkothari_...@hotmail.com
To: mapreduce-user@hadoop.apache.org
Subject: Accessing files from distributed cache
Date: Tue, 19 Oct 2010 13:03:04 -0700
Hi,
What's the way to access files copied to distributed cache from the map tasks ?
e.g.
if I run
Hi,
What's the way to access files copied to distributed cache from the map tasks ?
e.g.
if I run my M/R job as $hadoop jar my.jar -files hdfs://path/to/my/file.txt,
How can I access file.txt in my Map(or reduce) task ?
Thanks,
-Rakesh
ob is running)
On Tue, Oct 12, 2010 at 1:15 PM, rakesh kothari
wrote:
Thanks Shrijeet. Yeah, sorry both of these logs are from datanodes.
Also, I don't get this error when I run my job on just 1 file (450 MB).
I wonder why this happen in the reduce stage since I just have 10 reduc
t for
512). This param belongs to core-site.xml .
-Shrijeet
On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari
wrote:
Hi,
My MR Job is processing gzipped files each around 450 MB and there are 24 of
them. File block size is 512 MB.
This job is failing consistently in the reduce phas
Is there a reason why block size should be set to some 2^N, for some integer N
? Does it help with block defragmentation etc. ?
Thanks,
-Rakesh
Hi,
This link: http://www.cloudera.com/hadoop-mrunit no longer points to MRUnit.
Can someone please point out the location from where I can get it ?
Does MRUnit support Hadoop 0.20.1 ?
Thanks,
-Rakesh
d Rosenstrauch wrote:
From: David Rosenstrauch
Subject: Re: Partitioning Reducer Output
To: mapreduce-user@hadoop.apache.org
Date: Monday, April 5, 2010, 7:35 AM
On 04/02/2010 08:32 PM, rakesh kothari wrote:
>
> Hi,
>
> What's the best way to partition data generated from Reducer i
Hi,
What's the best way to partition data generated from Reducer into multiple =
directories in Hadoop 0.20.1. I was thinking of using MultipleTextOutputFor=
mat but that's not backward compatible with other API's in this version of =
hadoop.
Thanks,
-Rakesh
15 matches
Mail list logo