On Wed, Jan 22, 2014 at 10:27 PM, coocood <[email protected]> wrote: > I want to run redis server cluster on mesos, but have some problems. > > The first problem is the storage path, since it is storage service, I need > to set the storage path out of the sandbox, so the next run of the service > will find the data. and the data will not get garbage collected. The > scheduler will keep track of storage services and its state, and pass > storage path to the executor to create or restart a storage service. > > Does this solution have any problems? >
This works at the current time given the sandbox is not chrooted. In the future, there may be a different story for persistent storage, but at the current time you can write outside your sandbox so long as you have the necessary privileges. But, you will need to make sure you can tolerate the disk failing through a replication / backup strategy, which I've imagine you've done with your redis setup. In the future, to better deal with persistence requirements, we may expose raw disk as a resource that can have reservations applied to them as other resources. This would allow you to reserve disk resources for your particular role that needs them. Much of this is still up in the air. > > And another problem is that when the slave is deactived due to network > partition or slave process exited for more than 75 seconds, when the slave > connected again, the slave will be asked to shutdown itself and all its > tasks, so all the running storage services will be shutdown and you have to > start the slave again, and restart all the storage services. If the service > takes long time to restart, it will cause the service unavailable for a > while. > A few questions here: How are you running your slaves? Typically slaves are run under a tool that monitors the pid and restarts the slave when it exits. This ensures your slave is restarted automatically. We cannot distinguish between a partition and other classes of failure (such as machine failure), so the question is, why treat partitions any differently than the machine failing? How would your framework react when one of the machines running redis fails? Could the same strategy be applied to network partitions? > > Are there any way to solve this problem? like instead of simply shut down > the deactived slave and all its task, let framework decide how to handle > the re-registering of deactivated slave. I read the code, it says "We > disallow deactivated slaves from re-registering, we don't allow the slave > to re-register, as we've already informed frameworks that the tasks were > lost." > > The 75 second time could be configurable depending on the specific length of network partitions you want to consider acceptable. As for the issue re-registering deactivated slaves, we've opted not to allow this given the resulting complexity that is exposed to frameworks. With your suggestion we may inform the framework that a task is LOST, and subsequently later inform them that it is RUNNING. Tasks can go LOST for many reasons outside of network partitions, so a reaction is typically required on the LOST signal. At this point, when you later receive RUNNING, you've already reacted to the LOST signal, so what do you do? I think the semantics here would be fairly tricky.

