a way to allow spark job to continue despite task failures?

Nicolae Marasoiu Fri, 13 Nov 2015 09:06:56 -0800

Hi,


I know a task can fail 2 times and only the 3rd breaks the entire job.

I am good with this number of attempts.

I would like that after trying a task 3 times, it continues with the other 
tasks.

The job can be "failed", but I want all tasks run.

Please see my use case.


I read a hadoop input set, and some gzip files are incomplete. I would like to 
just skip them and the only way I see is to tell Spark to ignore some tasks 
permanently failing, if it is possible. With traditional hadoop map-reduce this 
was possible using mapred.max.map.failures.percent.


Do map-reduce params like mapred.max.map.failures.percent apply to Spark/YARN 
map-reduce jobs ?


I edited $HADOOP_CONF_DIR/mapred-site.xml and added 
mapred.max.map.failures.percent=30 but does not seem to apply, job still failed 
after 3 task attempt fails.


Should Spark transmit this parameter? Or the mapred.* do not apply?

Do other hadoop parameters (e.g. the ones involved in the input reading, not in 
the "processing" or "application" like this max.map.failures) - are others 
taken into account and transmitted? I saw that it should scan HADOOP_CONF_DIR 
and forward those, but I guess this does not apply to any parameter, since 
Spark has its own distribution & DAG stages processing logic, which just 
happens to have a YARN implementation.


Do you know a way to do this in Spark - to ignore a predefined number of tasks 
fail, but allow the job to continue? This way I could see all the faulty input 
files in one job run, delete them all and continue with the rest.


Just to mention, doing a manual gzip -t on top of hadoop cat is infeasible and 
map-reduce is way faster to scan the 15K files worth 70GB (its doing 25M/s per 
node), while the old style hadoop cat is doing much less.


Thanks,

Nicu

a way to allow spark job to continue despite task failures?

Reply via email to