Until now we were manually copying our Jars to all machines in a Hadoop
cluster. This used to work until our cluster size was small. Now our
cluster is getting bigger. What's the best way to start a Hadoop Job that
automatically distributes the Jar to all machines in a cluster?
I read the doc a
Ah so the threshold is job-level, not per task. OK.
One other way I think would be performant, AND still able to use Hadoop itself
would be to keep one reducer for this job, and have that reducer check if the
counter of total failed records exceeds the threshold or not. A reducer is
guaranteed
I think David's solution is viable, but don't use a local variable as a counter
in step 4, use a COUNTER object to count the error record, the COUNTER object
can work globally.
At 2011-11-16 03:08:45,"Mapred Learn" wrote:
Thanks David for a step-by-step response but this makes error threshol
JJ,
Two passes are necessary. First pass, just count how many lines are wrong. You
won't do any work on the data. It's just read the data. After this pass, record
the file status "good"/"bad" in a status file.
The second pass, before you start, check the file status file, and if the input
fil
I can't think of an easy way to do this. There's a few not-so-easy
approaches:
* Implement numErrors as a Hadoop counter, and then have the application
which submitted the job check the value of that counter once the job is
complete and have the app throw an error if the counter exceeds the
Hi Harsh,
My situation is to kill a job when this threshold is reached. If say
threshold is 10. And 2 mappers combined reached this value, how should I
achieve this.
With what you are saying, I think job will fail once a single mapper
reaches that threshold.
Thanks,
On Tue, Nov 15, 2011 at 11:
Mapred,
If you fail a task permanently upon encountering a bad situation, you basically
end up failing the job as well, automatically. By controlling the number of
retries (say down to 1 or 2 from 4 default total attempts), you can also have
it fail the job faster.
Is killing the job immediate
Hi Mingxi,
By dynamic counter you mean custom counter or is it a different kind of
counter ?
plus I cannot do 2 passes as I ge to know about errors in record only when
I parse the line.
Thanks,
-JJ
On Mon, Nov 14, 2011 at 3:38 PM, Mingxi Wu wrote:
> You can do two passes of the data.
>
>
Thanks David for a step-by-step response but this makes error threshold, a
per mapper threshold. Is there a way to make it per job so that all mappers
share this value and increment it as a shared counter ?
On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch wrote:
> On 11/14/2011 06:06 PM, Map
On 11/14/2011 06:06 PM, Mapred Learn wrote:
Hi,
I have a use case where I want to pass a threshold value to a map-reduce
job. For eg: error records=10.
I want map-reduce job to fail if total count of error_records in the job
i.e. all mappers, is reached.
How can I implement this considering t
Is there a good way to get logging enabled so I can get a better idea of
what¹s going on? I¹m starting to think that the ³heap² error is not the
systemic problem. I have changed heap related parameters and can¹t seem to
fix or even change the error conditions.
On 11/15/11 4:53 AM, "Mohamed Riad
Hi guys !
Q> How can i assign data of each job in mumak nodes and what else i need to
do ?
In general how can i use the pluggable block-placement for HDFS in Mumak ?
Meaning in my context i am using 19-jobs-trace json file and modified
topology json file consisting of say 4 nodes. Since the number
hadoop@lobster-nfs:/root$ java -d64 -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
On 11/15/11 4:53 AM, "Mohamed Riadh Trad" wrote:
> java -version
>
> to check if the java you are using is a
java -version
to check if the java you are using is a 32 bits or 64 bits version.
If you are using a 32bits version you cannot allow more than 3,5 Gbyte for the
heap.
Trad Mohamed Riadh, M.Sc, Ing.
PhD. student
INRIA-TELECOM PARISTECH - ENPC School of International Management
Office: 11-15
P
Including hadoop common user group as well in loop.
On Tue, Nov 15, 2011 at 1:01 PM, Bejoy Ks wrote:
> Hi Experts
>
> I'm currently working out to incorporate a performance test plan
> for a series of hadoop jobs.My entire application consists of map reduce,
> hive and flume jobs chained
15 matches
Mail list logo