Hi Experts
I'm currently working out to incorporate a performance test plan
for a series of hadoop jobs.My entire application consists of map reduce,
hive and flume jobs chained one after another and I need to do some
rigorous performance testing to ensure that it would never break under
Still issues, around 2300 unique files
hadoop@lobster-nfs:~/querry$ hadoop jar HadoopTest.jar -D
mapred.child.java.opts=-Xmx4096M
hdfs://lobster-nfs:9000/hadoop_fs/dfs/merra/seq_out
/hadoop_fs/dfs/output/test_14_r2.out
11/11/15 01:56:20 INFO hpc.Driver: Jar Name:
/home/hadoop/querry/Hadoo
try the -D mapred.child.java.opts=-Xmx4096M on the command line:
bin/hadoop jar yourjar.jar yourclass -D mapred.child.java.opts=-Xmx8219M
..
How many files do you have in your input folder?
Bests,
Trad Mohamed Riadh, M.Sc, Ing.
PhD. student
INRIA-TELECOM PAR
Any suggestions as to how to track down the root cause of these errors?
1178709 [main] INFO org.apache.hadoop.mapred.JobClient - map 6% reduce 0%
1178709 [main] INFO org.apache.hadoop.mapred.JobClient - map 6% reduce 0%
11/11/15 00:45:29 INFO mapred.JobClient: Task Id :
attempt_20150008_00
You can do two passes of the data.
The first map-reduce pass is sanity checking the data.
The second map-reduce pass is to do the real work assuming the first pass
accept the file.
You can utilize the dynamic counter and define an enum type for error record
categories.
In the mapper, you parse e
Hi,
I have a use case where I want to pass a threshold value to a map-reduce
job. For eg: error records=10.
I want map-reduce job to fail if total count of error_records in the job
i.e. all mappers, is reached.
How can I implement this considering that each mapper would be processing
some part