Performance test practices for hadoop jobs - capturing metrics

2011-11-14 Thread Bejoy Ks
Hi Experts I'm currently working out to incorporate a performance test plan for a series of hadoop jobs.My entire application consists of map reduce, hive and flume jobs chained one after another and I need to do some rigorous performance testing to ensure that it would never break under

Re: Mapreduce heap size error

2011-11-14 Thread Hoot Thompson
Still issues, around 2300 unique files hadoop@lobster-nfs:~/querry$ hadoop jar HadoopTest.jar -D mapred.child.java.opts=-Xmx4096M hdfs://lobster-nfs:9000/hadoop_fs/dfs/merra/seq_out /hadoop_fs/dfs/output/test_14_r2.out 11/11/15 01:56:20 INFO hpc.Driver: Jar Name: /home/hadoop/querry/Hadoo

Re: Mapreduce heap size error

2011-11-14 Thread Mohamed Riadh Trad
try the -D mapred.child.java.opts=-Xmx4096M on the command line: bin/hadoop jar yourjar.jar yourclass -D mapred.child.java.opts=-Xmx8219M .. How many files do you have in your input folder? Bests, Trad Mohamed Riadh, M.Sc, Ing. PhD. student INRIA-TELECOM PAR

Re: Mapreduce heap size error

2011-11-14 Thread Hoot Thompson
Any suggestions as to how to track down the root cause of these errors? 1178709 [main] INFO org.apache.hadoop.mapred.JobClient - map 6% reduce 0% 1178709 [main] INFO org.apache.hadoop.mapred.JobClient - map 6% reduce 0% 11/11/15 00:45:29 INFO mapred.JobClient: Task Id : attempt_20150008_00

RE: how to implement error thresholds in a map-reduce job ?

2011-11-14 Thread Mingxi Wu
You can do two passes of the data. The first map-reduce pass is sanity checking the data. The second map-reduce pass is to do the real work assuming the first pass accept the file. You can utilize the dynamic counter and define an enum type for error record categories. In the mapper, you parse e

how to implement error thresholds in a map-reduce job ?

2011-11-14 Thread Mapred Learn
Hi, I have a use case where I want to pass a threshold value to a map-reduce job. For eg: error records=10. I want map-reduce job to fail if total count of error_records in the job i.e. all mappers, is reached. How can I implement this considering that each mapper would be processing some part