Re: Resource Partition Failure

Vinayak Borkar Wed, 17 Apr 2013 23:34:50 -0700

Kishore,

Thanks for the explanation. I saw that HelixAdmin had calls to resetpartitions from error state -> initial state. So I was wondering ifmoving the partition to error state by the instance itself would be agood idea. But Ming's answer and your explanation obviate the need for that.



Thanks,
Vinayak


On 4/17/13 11:29 PM, kishore g wrote:

Ming is correct, you can use the enablePartition(false) to disable only the
corrupted partition on the node. This will trigger the rebalancer which
recomputes the ideal state.

We thought about allowing instance to move itself into ERROR state but we
were worried that giving control to instance to change its state
automatically is dangerous and makes it harder to debug issues.

We do have a mechanism for the participant to send a request to controller
to initiate a transition for example you can send a message to controller
to disable a partition/instance. ( This is different from disabling using
helix admin but though the end result is the same).

I dint get the second part " which was then reset by possibly the
controller"




On Wed, Apr 17, 2013 at 11:00 PM, Vinayak Borkar <[email protected]> wrote:

That sounds more promising. Does disabling a partition trigger ideal state
computation to rebalance the cluster?

Ideally it would be great if the corrupted instance could move itself to
the ERROR state which was then reset by possibly the controller. Is that
possible?





On 4/17/13 10:55 PM, Ming Fang wrote:

how about HelixAdmin.enablePartition()?

On Apr 18, 2013, at 1:53 AM, Vinayak Borkar <[email protected]> wrote:

  Hi Ming Fang,



Enable/Disable instance will take out all the resources hosted on an
instance. I would like to disable only the corrupted partition on the
system without impacting other resources.

Thanks,
Vinayak


On 4/17/13 10:43 PM, Ming Fang wrote:

Try HelixAdmin.enableInstance()

On Apr 18, 2013, at 12:28 AM, Vinayak Borkar <[email protected]> wrote:

  Hi,



What is the expected way for a system to indicate to Helix that a
partition of a resource has failed?

Say the bits on disk of a particular partition are found to be
corrupted. Is there a way to tell helix that that partition of that
resource needs to "fail" without killing the whole node and hence
destroying all other resources on that machine?

Thanks,
Vinayak

Re: Resource Partition Failure

Reply via email to