Re: mapred example task failing with error 127

2011-09-29 Thread Harsh J
Vinod, There should be some stderr information on the task attempts' userlogs that should help point out why your task launching is failing. It is probably cause of something related to the JVM launch parameters (as defined by mapred.child.java.opts). If not there, look into the TaskTracker logs

Re: How do I diagnose IO bounded errors using the framework counters?

2011-09-29 Thread W.P. McNeill
This is definitely a map-increase job. I could try a combiner, but I don't think that would help. My keys are small compared to my values, and values must be kept separate when they are accumulated in the reducer--they can't be combined into some smaller form, i.e. they are more like bitmaps than

Re: How do I diagnose IO bounded errors using the framework counters?

2011-09-29 Thread Lance Norskog
When in doubt, go straight to the owner of a fact. The operating system is what really knows disk i/o. "my mapper job--which may write multiple pairs for each one it receives--is writing too many" - ah, a map-increase job :) This is what Combiners are for- to keep explosions of data from hitting t

How do I diagnose IO bounded errors using the framework counters?

2011-09-29 Thread W.P. McNeill
I have a problem where certain Hadoop jobs take prohibitively long to run. My hypothesis is that I am generating more I/O than my cluster can handle and I need to substantiate this. I am looking closely at the Map Reduce framework counters because I think they contain the information I need, but I

mapred example task failing with error 127

2011-09-29 Thread Vinod Gupta Tankala
I just setup a pseudo-distributed hadoop setup. but when i run the example task, i get failed child error. I see that this was posted earlier as well but I didn't see the resolution. http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201108.mbox/%3cc30bf131a023ea4d976727cd4fc563fe0afbe...

Re: dump configuration

2011-09-29 Thread patrick sang
Thanks to all.. especially from SD. that's exactly what i am looking for. P On Wed, Sep 28, 2011 at 11:20 PM, Simon Dong wrote: > Or http://jobtracker:50030/conf > > -SD > > On Wed, Sep 28, 2011 at 2:39 PM, Raj V wrote: > > The xml configuration file is also available under hadoop logs on the

Re: Block Size

2011-09-29 Thread Uma Maheswara Rao G 72686
hi, Here is some useful info: A small file is one which is significantly smaller than the HDFS block size (default 64MB). If you’re storing small files, then you probably have lots of them (otherwise you wouldn’t turn to Hadoop), and the problem is that HDFS can’t handle lots of files. Every

Re: Running multiple MR Job's in sequence

2011-09-29 Thread John Conwell
If you are running on EC2, you can use elastic map reduce. It has a startup option where you specify the driver class in your jar, and it will run the driver, I believe, on the namenode, which wont really add any overhead because when the namenode is under stress, the driver will be sitting quietl

Re: Block Size

2011-09-29 Thread Chris Smith
On 29 September 2011 18:39, lessonz wrote: > I'm new to Hadoop, and I'm trying to understand the implications of a 64M > block size in the HDFS. Is there a good reference that enumerates the > implications of this decision and its effects on files stored in the system > as well as map-reduce jobs?

RE: Running multiple MR Job's in sequence

2011-09-29 Thread Aaron Baff
Yea, we don't want it to sit there waiting for the Job to complete, even if it's just a few minutes. --Aaron -Original Message- From: turboc...@gmail.com [mailto:turboc...@gmail.com] On Behalf Of John Conwell Sent: Thursday, September 29, 2011 10:50 AM To: common-user@hadoop.apache.org Su

Re: Running multiple MR Job's in sequence

2011-09-29 Thread John Conwell
After you kick off a job, say JobA, your client doesn't need to sit and ping Hadoop to see if it finished before it starts JobB. You can have the client block until the job is complete with "Job.waitForCompletion(boolean verbose)". Using this you can create a "job driver" that chains jobs togethe

Block Size

2011-09-29 Thread lessonz
I'm new to Hadoop, and I'm trying to understand the implications of a 64M block size in the HDFS. Is there a good reference that enumerates the implications of this decision and its effects on files stored in the system as well as map-reduce jobs? Thanks.

Re: FileSystem closed

2011-09-29 Thread Uma Maheswara Rao G 72686
FileSystem objects will be cached in jvm. When it tries to get the FS object by using Filesystem.get(..) ( sequence file internally will use it), it will return same fs object if scheme and authority is same for the uri. fs cache key's equals implementation is below static boolean isEqual(Obj

Re: Running multiple MR Job's in sequence

2011-09-29 Thread Joey Echeverria
I would definitely checkout Oozie for this use case. -Joey On Thu, Sep 29, 2011 at 12:51 PM, Aaron Baff wrote: > I saw this, but wasn't sure if it was something that ran on the client and > just submitted the Job's in sequence, or if that gave it all to the > JobTracker, and the JobTracker too

Re: FileSystem closed

2011-09-29 Thread Joey Echeverria
Do you close your FileSystem instances at all? IIRC, the FileSystem instance you use is a singleton and if you close it once, it's closed for everybody. My guess is you close it in your cleanup method and you have JVM reuse turned on. -Joey On Thu, Sep 29, 2011 at 12:49 PM, Mark question wrote:

RE: Running multiple MR Job's in sequence

2011-09-29 Thread Aaron Baff
I saw this, but wasn't sure if it was something that ran on the client and just submitted the Job's in sequence, or if that gave it all to the JobTracker, and the JobTracker took care of submitting the Jobs in sequence appropriately. Basically, I'm looking for a completely stateless client, that

Re: Is SAN storage is a good option for Hadoop ?

2011-09-29 Thread Steve Loughran
On 29/09/11 13:28, Brian Bockelman wrote: On Sep 29, 2011, at 1:50 AM, praveenesh kumar wrote: Hi, I want to know can we use SAN storage for Hadoop cluster setup ? If yes, what should be the best pratices ? Is it a good way to do considering the fact "the underlining power of Hadoop is co-lo

Re: Is SAN storage is a good option for Hadoop ?

2011-09-29 Thread Brian Bockelman
On Sep 29, 2011, at 1:50 AM, praveenesh kumar wrote: > Hi, > > I want to know can we use SAN storage for Hadoop cluster setup ? > If yes, what should be the best pratices ? > > Is it a good way to do considering the fact "the underlining power of Hadoop > is co-locating the processing power (CP

Re: Hadoop performance benchmarking with TestDFSIO

2011-09-29 Thread Steve Loughran
On 28/09/11 22:45, Sameer Farooqui wrote: Hi everyone, I'm looking for some recommendations for how to get our Hadoop cluster to do faster I/O. Currently, our lab cluster is 8 worker nodes and 1 master node (with NameNode and JobTracker). Each worker node has: - 48 GB RAM - 16 processors (Inte

Re: Is SAN storage is a good option for Hadoop ?

2011-09-29 Thread Paul Ingles
Our Hadoop journey included a brief stint running on our own virtualised infrastructure. Our pre-Hadoop application was already running on the VM infrastructure so we set up a small cluster as virtual machines on the SAN. It worked ok for a while but as our usage grew we ditched it for a couple

Re: Problems with Rumen in Hadoop-0.21.0

2011-09-29 Thread Ravi Gummadi
Are you using hadoop-0.21.0 (may be unstable relaease) ? Using 0.20.204 or 0.22 would be better. >> "WARN rumen.TraceBuilder: File skipped: Invalid file name: job_201109221644_0001_" This means : Rumen is assuming that the jobhistory file name format to be something else --- may be without "_us

High AVailability Hadoop

2011-09-29 Thread shanmuganathan.r
Hi All, I am using the hadoop in distributed mode. I want to know there is any option to move the fsimage and editlog files from namenode to other machine except NFS. ? If we reduce the checkpoint time , what are the drawbacks occur in future ? Thanks & Regards R.Shanmuganatha