Re: why does my mapper class reads my input file twice?

2012-03-05 Thread Harsh J
Its your use of the mapred.input.dir property, which is a reserved name in the framework (its what FileInputFormat uses). You have a config you extract path from: Path input = new Path(conf.get("mapred.input.dir")); Then you do: FileInputFormat.addInputPath(job, input); Which internally, simply

hadoop 1.0 / HOD or CloneZilla?

2012-03-05 Thread Masoud
Hi all, I have experience with hadoop 0.20.204 on 3 machines cluster as pilot, now im trying to setup real cluster on 32 linux machines. I have some question: 1. is hadoop 1.0 stable?? in hadoop site this version is indicated as beta release 2. as you know installing and setting up hadoop

Re: Java Heap space error

2012-03-05 Thread Mohit Anchlia
Sorry for multiple emails. I did find: 2012-03-05 17:26:35,636 INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory handler call- Usage threshold init = 715849728(699072K) used = 575921696(562423K) committed = 715849728(699072K) max = 715849728(699072K) 2012-03-05 17:26:35,719 INFO

Re: Java Heap space error

2012-03-05 Thread Mohit Anchlia
All I see in the logs is: 2012-03-05 17:26:36,889 FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201203051722_0001_m_30_1 - Killed : Java heap space Looks like task tracker is killing the tasks. Not sure why. I increased heap from 512 to 1G and still it fails. On Mon, Mar 5, 201

Re: OutOfMemoryError: unable to create new native thread

2012-03-05 Thread Clay Chiang
Hi Rohini, The similar problem was just encountered for me yesterday. But for my situation, the max process num (ulimit -u) is set to 1024, which is too small. And when i increase it to 100, the problem gone. But u said "Ulimit on the machine is set to unlimited", i'm not sure this will h

Re: Custom Seq File Loader: ClassNotFoundException

2012-03-05 Thread Mark question
Unfortunately, "public" didn't change my error ... Any other ideas? Has anyone ran Hadoop on eclipse with custom sequence inputs ? Thank you, Mark On Mon, Mar 5, 2012 at 9:58 AM, Mark question wrote: > Hi Madhu, it has the following line: > > TermDocFreqArrayWritable () {} > > but I'll try it w

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-05 Thread Russell Jurney
Streaming is good for simulation. Long running map-only processes, where pig doesn't really help and it is simple to fire off a streaming process. You do have to set some options so they can take a long time to return/return counters. Russell Jurney http://datasyndrome.com On Mar 5, 2012, at 1

Re: Comparison of Apache Pig Vs. Hadoop Streaming M/R

2012-03-05 Thread Eli Finkelshteyn
I'm really interested in this as well. I have trouble seeing a really good use case for streaming map-reduce. Is there something I can do in streaming that I can't do in Pig? If I want to re-use previously made Python functions from my code base, I can do that in Pig as much as Streaming, and f

Re: Custom Seq File Loader: ClassNotFoundException

2012-03-05 Thread Mark question
Hi Madhu, it has the following line: TermDocFreqArrayWritable () {} but I'll try it with "public" access in case it's been called outside of my package. Thank you, Mark On Sun, Mar 4, 2012 at 9:55 PM, madhu phatak wrote: > Hi, > Please make sure that your CustomWritable has a default constru

Re: AWS MapReduce

2012-03-05 Thread Mohit Anchlia
On Mon, Mar 5, 2012 at 7:40 AM, John Conwell wrote: > AWS MapReduce (EMR) does not use S3 for its HDFS persistance. If it did > your S3 billing would be massive :) EMR reads all input jar files and > input data from S3, but it copies these files down to its local disk. It > then does starts th

Re: AWS MapReduce

2012-03-05 Thread John Conwell
AWS MapReduce (EMR) does not use S3 for its HDFS persistance. If it did your S3 billing would be massive :) EMR reads all input jar files and input data from S3, but it copies these files down to its local disk. It then does starts the MR process, doing all HDFS reads and writes to the local dis

Re: Setting up Hadoop single node setup on Mac OS X

2012-03-05 Thread John Armstrong
On 02/27/2012 11:53 AM, W.P. McNeill wrote: You don't need any virtualization. Mac OS X is Linux and runs Hadoop as is. Nitpick: OS X is NEXTSTEP based on Mach, which is a different POSIX-compliant system from Linux.

fairscheduler : group.name | Please edit patch to work for 0.20.205

2012-03-05 Thread Austin Chungath
Can someone have a look at the patch MAPREDUCE-2457 and see if it can be modified to work for 0.20.205? I am very new to java and have no idea what's going on in that patch. If you have any pointers for me, I will see if I can do it on my own. Thanks, Austin On Fri, Mar 2, 2012 at 7:15 PM, Austin