Re: Long running storage service on mesos.

Benjamin Mahler Thu, 23 Jan 2014 16:31:13 -0800

On Wed, Jan 22, 2014 at 10:27 PM, coocood <[email protected]> wrote:

> I want to run redis server cluster on mesos, but have some problems.
>
> The first problem is the storage path, since it is storage service, I need
> to set the storage path out of the sandbox, so the next run of the service
> will find the data. and the data will not get garbage collected. The
> scheduler will keep track of storage services and its state, and pass
> storage path to the executor to create or restart a storage service.
>
> Does this solution have any problems?
>


This works at the current time given the sandbox is not chrooted. In the
future, there may be a different story for persistent storage, but at the
current time you can write outside your sandbox so long as you have the
necessary privileges. But, you will need to make sure you can tolerate the
disk failing through a replication / backup strategy, which I've imagine
you've done with your redis setup.

In the future, to better deal with persistence requirements, we may expose
raw disk as a resource that can have reservations applied to them as other
resources. This would allow you to reserve disk resources for your
particular role that needs them. Much of this is still up in the air.


>
> And another problem is that when the slave is deactived due to network
> partition or slave process exited for more than 75 seconds, when the slave
> connected again, the slave will be asked to shutdown itself and all its
> tasks, so all the running storage services will be shutdown and you have to
> start the slave again, and restart all the storage services. If the service
> takes long time to restart, it will cause the service unavailable for a
> while.
>

A few questions here:

How are you running your slaves? Typically slaves are run under a tool that
monitors the pid and restarts the slave when it exits. This ensures your
slave is restarted automatically.

We cannot distinguish between a partition and other classes of failure
(such as machine failure), so the question is, why treat partitions any
differently than the machine failing? How would your framework react when
one of the machines running redis fails? Could the same strategy be applied
to network partitions?


>
> Are there any way to solve this problem? like instead of simply shut down
> the deactived slave and all its task, let framework decide how to handle
> the re-registering of deactivated slave. I read the code, it says "We
> disallow deactivated slaves from re-registering, we don't allow the slave
> to re-register, as we've already informed frameworks that the tasks were
> lost."
>
>
The 75 second time could be configurable depending on the specific length
of network partitions you want to consider acceptable.

As for the issue re-registering deactivated slaves, we've opted not to
allow this given the resulting complexity that is exposed to frameworks.
With your suggestion we may inform the framework that a task is LOST, and
subsequently later inform them that it is RUNNING.

Tasks can go LOST for many reasons outside of network partitions, so a
reaction is typically required on the LOST signal. At this point, when you
later receive RUNNING, you've already reacted to the LOST signal, so what
do you do? I think the semantics here would be fairly tricky.

Re: Long running storage service on mesos.

Reply via email to