Hi, I have a weird issue with my giraph setup.. if the dataset is small, then everything goes on smoothly but as soon as I give large dataset... The job fails.. but I dont see any error in logs. except 2014-12-08 15:31:51,325 INFO [main-SendThread(srv-110-07.720.rdio:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server srv-110-07.720.rdio/10.5.99.47:2181. Will not attempt to authenticate using SASL (unknown error)
The log is pasted here http://pastebin.com/9PxnWe53 Here is the commandline execution: 14/12/08 15:31:05 INFO mapreduce.Job: Job job_1407963352299_283611 running in uber mode : false 14/12/08 15:31:05 INFO mapreduce.Job: map 6% reduce 0% 14/12/08 15:31:16 INFO mapreduce.Job: map 13% reduce 0% 14/12/08 15:31:20 INFO mapreduce.Job: map 19% reduce 0% 14/12/08 15:31:25 INFO mapreduce.Job: map 25% reduce 0% 14/12/08 15:31:26 INFO mapreduce.Job: map 31% reduce 0% 14/12/08 15:31:33 INFO mapreduce.Job: map 38% reduce 0% 14/12/08 15:31:34 INFO mapreduce.Job: map 44% reduce 0% 14/12/08 15:31:36 INFO mapreduce.Job: map 50% reduce 0% 14/12/08 15:31:39 INFO mapreduce.Job: map 56% reduce 0% 14/12/08 15:31:41 INFO mapreduce.Job: map 63% reduce 0% 14/12/08 15:31:43 INFO mapreduce.Job: map 69% reduce 0% 14/12/08 15:31:48 INFO mapreduce.Job: map 75% reduce 0% 14/12/08 15:31:53 INFO mapreduce.Job: map 81% reduce 0% 14/12/08 15:31:58 INFO mapreduce.Job: map 88% reduce 0% 14/12/08 15:32:01 INFO mapreduce.Job: map 94% reduce 0% 14/12/08 15:32:03 INFO mapreduce.Job: map 100% reduce 0% 14/12/08 15:33:44 INFO mapreduce.Job: Job job_1407963352299_283611 failed with state FAILED due to: Task failed task_1407963352299_283611_m_000013 Job failed as tasks failed. failedMaps:1 failedReduces:0 14/12/08 15:33:44 INFO mapreduce.Job: Counters: 9 Job Counters Failed map tasks=1 Killed map tasks=15 Launched map tasks=16 Other local map tasks=16 Total time spent by all maps in occupied slots (ms)=2199355 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=2199355 Total vcore-seconds taken by all map tasks=2199355 Total megabyte-seconds taken by all map tasks=8577484500 The commandline exec hadoop jar $JAR org.bar.foo.Driver AlgorithmName -D mapred.child.java.opts=-Xmx4G \ -Dgiraph.zkList=srv-110-07:2181,srv-110-08:2181,srv-210-08:2181 \ -libjars $JAR -i $INPUT -iter $SUPER_STEPS -w $NUM_WORKERS -o $OUTPUT How do I debug this thing.? Thanks -- Mohit "When you want success as badly as you want the air, then you will get it. There is no other secret of success." -Socrates