Hi Praveen What is your config about "* spark.local.dir" ? * Is all your worker has this dir and all worker has right permission on this dir?
I think this is the reason of your error Wisely Chen On Mon, Apr 14, 2014 at 9:29 PM, Praveen R <prav...@sigmoidanalytics.com>wrote: > Had below error while running shark queries on 30 node cluster and was not > able to start shark server or run any jobs. > > *14/04/11 19:06:52 ERROR scheduler.TaskSchedulerImpl: Lost an executor 4 > (already removed): Failed to create local directory (bad spark.local.dir?)* > *Full log: *https://gist.github.com/praveenr019/10647049 > > After spending quite some time, found it was due to disk read errors on > one node and had the cluster working after removing the node. > > Wanted to know if there is any configuration (like akkaTimeout) which can > handle this or does mesos help ? > > Shouldn't the worker be marked dead in such scenario, instead of making > the cluster non-usable so the debugging can be done at leisure. > > Thanks, > Praveen R > > >