EOFException when using LZO to compress map/reduce output

2011-08-14 Thread rakesh kothari
Hi, I am using LZO to compress my intermediate map outputs. These are the settings: mapred.map.output.compression.codec = com.hadoop.compression.lzo.LzoCodec pig.tmpfilecompression.codec = lzo But I am consistently getting the following exception (I dont get this exception when I use "gz" a

Failed vs Killed Tasks in Hadoop

2011-06-20 Thread rakesh kothari
Hi, Does "maps_failed" counter includes Tasks that were killed due to speculative execution ? Same with "reduces_faile" and Killed reduce tasks. Thanks, -Rakesh

JobTracker goes into seemingly infinite loop

2011-05-05 Thread rakesh kothari
Hi, I am using Hadoop 0.20.1. Recently we had a JobTracker outage because of the following: JobTracker tries to write a file to HDFS but it's connection to primary datanode gets disrupted. It then subsequently enters into retry loop (that goes on for hours). I see the the following message i

RE: mapred.local.dir cleanup

2011-01-20 Thread rakesh kothari
Any ideas on how "attempt*" directories getting created directly under "mapred.local.dir" ? Pointers to parts of the source code would help too. Thanks, -Rakesh From: rkothari_...@hotmail.com To: mapreduce-user@hadoop.apache.org Subject: mapred.local.dir cleanup Date: Tue, 18 Jan 2011 17:20:04

mapred.local.dir cleanup

2011-01-18 Thread rakesh kothari
Hi, I am seeing lots of leftover directories going back as far as 12 days in the task trackers "mapred.local.dir". These directories are for "M/R task attempts". How are these directories end up in "mapred.local.dir" as from my understanding these directories should be in "mapred.local.dir/t

Mapper processing gzipped file

2011-01-18 Thread rakesh kothari
Hi, There is a gzipped file that needs to be processed by a Map-only hadoop job. If the size of this file is more than the space reserved for non-dfs use on the tasktracker host processing this file and if it's a non data local map task, would this job eventually fail ? Is hadoop jobtracker sm

Moving files in hdfs using API

2010-10-21 Thread rakesh kothari
Hi, Is "move" not supported in Hdfs ? I can't find any API for that. Looking at the source code for hadoop CLI it seems like it's implementing move by copying data from src to dest and deleting the src. This could be a time consuming operation. Thanks, -Rakesh

RE: Accessing files from distributed cache

2010-10-19 Thread rakesh kothari
I am using Hadoop 0.20.1. -Rakesh From: rkothari_...@hotmail.com To: mapreduce-user@hadoop.apache.org Subject: Accessing files from distributed cache Date: Tue, 19 Oct 2010 13:03:04 -0700 Hi, What's the way to access files copied to distributed cache from the map tasks ? e.g. if I run

Accessing files from distributed cache

2010-10-19 Thread rakesh kothari
Hi, What's the way to access files copied to distributed cache from the map tasks ? e.g. if I run my M/R job as $hadoop jar my.jar -files hdfs://path/to/my/file.txt, How can I access file.txt in my Map(or reduce) task ? Thanks, -Rakesh

RE: Failures in the reducers

2010-10-12 Thread rakesh kothari
ob is running) On Tue, Oct 12, 2010 at 1:15 PM, rakesh kothari wrote: Thanks Shrijeet. Yeah, sorry both of these logs are from datanodes. Also, I don't get this error when I run my job on just 1 file (450 MB). I wonder why this happen in the reduce stage since I just have 10 reduc

RE: Failures in the reducers

2010-10-12 Thread rakesh kothari
t for 512). This param belongs to core-site.xml . -Shrijeet On Tue, Oct 12, 2010 at 12:53 PM, rakesh kothari wrote: Hi, My MR Job is processing gzipped files each around 450 MB and there are 24 of them. File block size is 512 MB. This job is failing consistently in the reduce phas

Hdfs Block Size

2010-10-07 Thread rakesh kothari
Is there a reason why block size should be set to some 2^N, for some integer N ? Does it help with block defragmentation etc. ? Thanks, -Rakesh

MRUnit Download

2010-08-20 Thread rakesh kothari
Hi, This link: http://www.cloudera.com/hadoop-mrunit no longer points to MRUnit. Can someone please point out the location from where I can get it ? Does MRUnit support Hadoop 0.20.1 ? Thanks, -Rakesh

RE: Partitioning Reducer Output

2010-04-05 Thread rakesh kothari
d Rosenstrauch wrote: From: David Rosenstrauch Subject: Re: Partitioning Reducer Output To: mapreduce-user@hadoop.apache.org Date: Monday, April 5, 2010, 7:35 AM On 04/02/2010 08:32 PM, rakesh kothari wrote: > > Hi, > > What's the best way to partition data generated from Reducer i

Partitioning Reducer Output

2010-04-02 Thread rakesh kothari
Hi, What's the best way to partition data generated from Reducer into multiple = directories in Hadoop 0.20.1. I was thinking of using MultipleTextOutputFor= mat but that's not backward compatible with other API's in this version of = hadoop. Thanks, -Rakesh