Re: many exceptions during recovery from a total shutdown

2017-04-05 Thread Neutron Sharc
We are using helix-0.7.1 All these logs are from participants, not from controller. -Shawn On Wed, Apr 5, 2017 at 7:09 PM, kishore g wrote: > Hi Shawn, > > Are the logs on the participant or controller? what is the helix version? > > > > On Wed, Apr 5, 2017 at 6:36 PM, N

many exceptions during recovery from a total shutdown

2017-04-05 Thread Neutron Sharc
Hi all, We are testing a failure recovery scenario where I have many resources spanning many participants. I shutdown all participants and helix admins, wait a while, then add each participant back into cluster. (zookeeper is on a separate cluster, not affected by shtudown) During the recovery,

dynamic zookeepr quorum

2017-03-18 Thread Neutron Sharc
Hi all, Recent zookeeper 3.5 allows to dynamically grow zookeeper quorum. Does helix zookeeper clients perform runtime reconfigure to use the new zookeeper servers? -Shawn

Re: [jira] [Commented] (HELIX-527) Mitigate zookeeper watch leak

2016-08-31 Thread Neutron sharc
We are using zookeeper 3.5.1. It seems from your message that we are not vulnerable to this issue? On Wed, Aug 31, 2016 at 10:44 AM, Lei Xia (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/HELIX-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedComm

Re: reuse replicas after a failed participant reconnects

2016-06-06 Thread Neutron sharc
tate of p0 on > node-1 will be offline. Once node-1 comes back, Helix will bring p0 on > node-1 back from offline to online. > > Not sure if this answers your question. > > > Thanks > Lei > > > On Fri, Jun 3, 2016 at 2:55 PM, Neutron sharc > wrote: >

Re: error when reading znode: invalid stream header: 7B0A2020

2016-06-06 Thread Neutron sharc
Thanks Kishore. "zkClient.setZkSerializer(new ZNRecordSerializer())" solved the problem. On Mon, Jun 6, 2016 at 6:32 PM, kishore g wrote: > zkClient.setStreamingSerializer(new ZNRecordSerialiazer()) something like > that. > > On Mon, Jun 6, 2016 at 6:00 PM, Neutron sharc

error when reading znode: invalid stream header: 7B0A2020

2016-06-06 Thread Neutron sharc
Hi the team, I want to read this znode to get partitions assigned to a dead participant: "/INSTANCES//CURRENTSTATES//" I use this code snippet to read: accessor = new ZkBaseDataAccessor(zkClient); String path = x; ZNRecord record = accessor.get(path, null, AccessOption.PERSISTENT); Immediat

Re: reuse replicas after a failed participant reconnects

2016-06-06 Thread Neutron sharc
node-1 back from offline to online. > > Not sure if this answers your question. > > > Thanks > Lei > > > On Fri, Jun 3, 2016 at 2:55 PM, Neutron sharc > wrote: > >> Hi the team, >> >> semi-auto mode supports a feature that, after a failed partic

reuse replicas after a failed participant reconnects

2016-06-03 Thread Neutron sharc
Hi the team, semi-auto mode supports a feature that, after a failed participant comes back online, its owned replicas will be reused again (transit from offline to slave etc). How can Helix recognize the replicas that are owned by a participant after it reconnects after a failure?We are tryin

Re: calling ZKHelixLock from state machine transition

2016-05-25 Thread Neutron sharc
25, 2016 at 7:41 PM, Neutron sharc > wrote: > >> Hi Kishore, Kanak, any updates? >> >> On Thu, May 19, 2016 at 4:13 PM, kishore g wrote: >> > Thanks Shawn. Will review it tonight. Kanak, It will be great if you can >> > take a look at it as well. >&

Re: calling ZKHelixLock from state machine transition

2016-05-25 Thread Neutron sharc
Hi Kishore, Kanak, any updates? On Thu, May 19, 2016 at 4:13 PM, kishore g wrote: > Thanks Shawn. Will review it tonight. Kanak, It will be great if you can > take a look at it as well. > > On Thu, May 19, 2016 at 3:45 PM, Neutron sharc > wrote: > >> Hi Helix team,

Re: calling ZKHelixLock from state machine transition

2016-05-19 Thread Neutron sharc
Hi Helix team, I uploaded a PR to fix this bug: https://github.com/apache/helix/pull/44 Thanks. On Wed, May 18, 2016 at 11:01 PM, Neutron sharc wrote: > Hi Kanak, > > The same problem with zk helix lock re-appears. I found some clues > about the potential bug. This potential bu

Re: calling ZKHelixLock from state machine transition

2016-05-18 Thread Neutron sharc
ng less than me node: /shawn1/LOCKS/RESOURCE_Pool0/x-72233245264911662-79 => T15 found T19 to be smallest so it waits for T19. Nobody will wake up T19, so T15 is also blocked. Any comments appreciated. Thanks. -Neutronsharc On Sat, May 14, 2016 at 5:20 PM, Neutron sharc wro

Re: calling ZKHelixLock from state machine transition

2016-05-14 Thread Neutron sharc
We increased the max connections allowed per client at zk server side. The problem is gone now. On Tue, May 10, 2016 at 2:50 PM, Neutron sharc wrote: > Hi Kanak, thanks for reply. > > The problem is gone if we set a constraint of 1 on "STATE_TRANSITION" > for the resource.

Re: calling ZKHelixLock from state machine transition

2016-05-10 Thread Neutron sharc
Hi Kanak, thanks for reply. The problem is gone if we set a constraint of 1 on "STATE_TRANSITION" for the resource. If we allow multiple state transitions to be executed in the resource, then this zklock problem occurs. btw, we run multiple participants in a same jvm in our test. In other wo

calling ZKHelixLock from state machine transition

2016-05-09 Thread Neutron sharc
Hi Helix team, We observed an issue at state machine transition handle: // statemodel.java: public void offlineToSlave(Message message, NotificationContext context) { // do work to start a local shard // we want to save the new shard info to resource config ZKHelixLock zklock = new ZKH

Re: want to get dead replicas in USER_DEFINED rebalance callback

2016-05-02 Thread Neutron sharc
"REBALANCE_MODE" : "USER_DEFINED", "REPLICAS" : "3", "STATE_MODEL_DEF_REF" : "M1StateModel", "STATE_MODEL_FACTORY_NAME" : "DEFAULT" } On Mon, May 2, 2016 at 12:49 PM, kishore g wrote: > Can you paste the initial IS t

Re: want to get dead replicas in USER_DEFINED rebalance callback

2016-05-02 Thread Neutron sharc
ARTITION > } > } > > This allows your logic to be idempotent and not depend on incremental > changes. > > thanks, > Kishore G > > On Thu, Apr 28, 2016 at 4:27 PM, Neutron sharc > wrote: > >> Hi team, >> >> in USER_DEFINED rebalance mode, the ca

want to get dead replicas in USER_DEFINED rebalance callback

2016-04-28 Thread Neutron sharc
Hi team, in USER_DEFINED rebalance mode, the callback computeResourceMapping() accepts a “currentState”. Does this variable include replicas on a dead participant ? For example, my resource has a partition P1 master replica on participant node1, a slave replica on participant node2. When node1

Re: error from HelixStateTransitionHandler

2016-04-25 Thread Neutron sharc
Here is my test state machine definition. Thanks for reviewing! https://github.com/neutronsharc/tools/blob/master/HcdDiskStateModelFactory.java On Mon, Apr 25, 2016 at 9:18 PM, kishore g wrote: > Can you paste the state machine definition as well. > On Apr 25, 2016 8:41 PM, "Ne

Re: error from HelixStateTransitionHandler

2016-04-25 Thread Neutron sharc
ROR 2016-04-25 20:13:48,594 org.apache.helix.messaging.handling.HelixStateTransitionHandler:385] Skip internal error. errCode: ERROR, errMsg: Current state of stateModel does not match the fromState in Message, Current State:MASTER, message expected:SLAVE, partition: Pool0_1, from: host1_admin, to

error from HelixStateTransitionHandler

2016-04-23 Thread Neutron sharc
Hi Helix team, I keep seeing this error from HelixStateTransitionHandler when the state machine is running. It seems a partition's actual state doesn't match with the state marked in controller message. What are the usual causes? I'm using helix 0.7.1. Here is my maven pom.xml: org.apach