Re: Distributing our jars to all machines in a cluster

2011-11-15 Thread Something Something
Until now we were manually copying our Jars to all machines in a Hadoop cluster. This used to work until our cluster size was small. Now our cluster is getting bigger. What's the best way to start a Hadoop Job that automatically distributes the Jar to all machines in a cluster? I read the doc a

Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread Harsh J
Ah so the threshold is job-level, not per task. OK. One other way I think would be performant, AND still able to use Hadoop itself would be to keep one reducer for this job, and have that reducer check if the counter of total failed records exceeds the threshold or not. A reducer is guaranteed

Re:Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread rabbit_cheng
I think David's solution is viable, but don't use a local variable as a counter in step 4, use a COUNTER object to count the error record, the COUNTER object can work globally. At 2011-11-16 03:08:45,"Mapred Learn" wrote: Thanks David for a step-by-step response but this makes error threshol

RE: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread Mingxi Wu
JJ, Two passes are necessary. First pass, just count how many lines are wrong. You won't do any work on the data. It's just read the data. After this pass, record the file status "good"/"bad" in a status file. The second pass, before you start, check the file status file, and if the input fil

Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread David Rosenstrauch
I can't think of an easy way to do this. There's a few not-so-easy approaches: * Implement numErrors as a Hadoop counter, and then have the application which submitted the job check the value of that counter once the job is complete and have the app throw an error if the counter exceeds the

Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread Mapred Learn
Hi Harsh, My situation is to kill a job when this threshold is reached. If say threshold is 10. And 2 mappers combined reached this value, how should I achieve this. With what you are saying, I think job will fail once a single mapper reaches that threshold. Thanks, On Tue, Nov 15, 2011 at 11:

Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread Harsh J
Mapred, If you fail a task permanently upon encountering a bad situation, you basically end up failing the job as well, automatically. By controlling the number of retries (say down to 1 or 2 from 4 default total attempts), you can also have it fail the job faster. Is killing the job immediate

Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread Mapred Learn
Hi Mingxi, By dynamic counter you mean custom counter or is it a different kind of counter ? plus I cannot do 2 passes as I ge to know about errors in record only when I parse the line. Thanks, -JJ On Mon, Nov 14, 2011 at 3:38 PM, Mingxi Wu wrote: > You can do two passes of the data. > >

Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread Mapred Learn
Thanks David for a step-by-step response but this makes error threshold, a per mapper threshold. Is there a way to make it per job so that all mappers share this value and increment it as a shared counter ? On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch wrote: > On 11/14/2011 06:06 PM, Map

Re: how to implement error thresholds in a map-reduce job ?

2011-11-15 Thread David Rosenstrauch
On 11/14/2011 06:06 PM, Mapred Learn wrote: Hi, I have a use case where I want to pass a threshold value to a map-reduce job. For eg: error records=10. I want map-reduce job to fail if total count of error_records in the job i.e. all mappers, is reached. How can I implement this considering t

Re: Mapreduce heap size error

2011-11-15 Thread Hoot Thompson
Is there a good way to get logging enabled so I can get a better idea of what¹s going on? I¹m starting to think that the ³heap² error is not the systemic problem. I have changed heap related parameters and can¹t seem to fix or even change the error conditions. On 11/15/11 4:53 AM, "Mohamed Riad

How is data of each job assigned to nodes in Mumak ?

2011-11-15 Thread arun k
Hi guys ! Q> How can i assign data of each job in mumak nodes and what else i need to do ? In general how can i use the pluggable block-placement for HDFS in Mumak ? Meaning in my context i am using 19-jobs-trace json file and modified topology json file consisting of say 4 nodes. Since the number

Re: Mapreduce heap size error

2011-11-15 Thread Hoot Thompson
hadoop@lobster-nfs:/root$ java -d64 -version java version "1.6.0_26" Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode) On 11/15/11 4:53 AM, "Mohamed Riadh Trad" wrote: > java -version > > to check if the java you are using is a

Re: Mapreduce heap size error

2011-11-15 Thread Mohamed Riadh Trad
java -version to check if the java you are using is a 32 bits or 64 bits version. If you are using a 32bits version you cannot allow more than 3,5 Gbyte for the heap. Trad Mohamed Riadh, M.Sc, Ing. PhD. student INRIA-TELECOM PARISTECH - ENPC School of International Management Office: 11-15 P

Re: Performance test practices for hadoop jobs - capturing metrics

2011-11-15 Thread Bejoy Ks
Including hadoop common user group as well in loop. On Tue, Nov 15, 2011 at 1:01 PM, Bejoy Ks wrote: > Hi Experts > > I'm currently working out to incorporate a performance test plan > for a series of hadoop jobs.My entire application consists of map reduce, > hive and flume jobs chained