Hi Praveen

What is your config about "* spark.local.dir" ? *
Is all your worker has this dir and all worker has right permission on this
dir?

I think this is the reason of your error

Wisely Chen


On Mon, Apr 14, 2014 at 9:29 PM, Praveen R <prav...@sigmoidanalytics.com>wrote:

> Had below error while running shark queries on 30 node cluster and was not
> able to start shark server or run any jobs.
>
> *14/04/11 19:06:52 ERROR scheduler.TaskSchedulerImpl: Lost an executor 4
> (already removed): Failed to create local directory (bad spark.local.dir?)*
> *Full log: *https://gist.github.com/praveenr019/10647049
>
> After spending quite some time, found it was due to disk read errors on
> one node and had the cluster working after removing the node.
>
> Wanted to know if there is any configuration (like akkaTimeout) which can
> handle this or does mesos help ?
>
> Shouldn't the worker be marked dead in such scenario, instead of making
> the cluster non-usable so the debugging can be done at leisure.
>
> Thanks,
> Praveen R
>
>
>

Reply via email to