We are using helix-0.7.1
All these logs are from participants, not from controller.
-Shawn
On Wed, Apr 5, 2017 at 7:09 PM, kishore g wrote:
> Hi Shawn,
>
> Are the logs on the participant or controller? what is the helix version?
>
>
>
> On Wed, Apr 5, 2017 at 6:36 PM, N
Hi all,
We are testing a failure recovery scenario where I have many resources
spanning many participants. I shutdown all participants and helix
admins, wait a while, then add each participant back into cluster.
(zookeeper is on a separate cluster, not affected by shtudown) During
the recovery,
Hi all,
Recent zookeeper 3.5 allows to dynamically grow zookeeper quorum.
Does helix zookeeper clients perform runtime reconfigure to use the
new zookeeper servers?
-Shawn
We are using zookeeper 3.5.1. It seems from your message that we are
not vulnerable to this issue?
On Wed, Aug 31, 2016 at 10:44 AM, Lei Xia (JIRA) wrote:
>
> [
> https://issues.apache.org/jira/browse/HELIX-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedComm
tate of p0 on
> node-1 will be offline. Once node-1 comes back, Helix will bring p0 on
> node-1 back from offline to online.
>
> Not sure if this answers your question.
>
>
> Thanks
> Lei
>
>
> On Fri, Jun 3, 2016 at 2:55 PM, Neutron sharc
> wrote:
>
Thanks Kishore. "zkClient.setZkSerializer(new ZNRecordSerializer())"
solved the problem.
On Mon, Jun 6, 2016 at 6:32 PM, kishore g wrote:
> zkClient.setStreamingSerializer(new ZNRecordSerialiazer()) something like
> that.
>
> On Mon, Jun 6, 2016 at 6:00 PM, Neutron sharc
Hi the team,
I want to read this znode to get partitions assigned to a dead participant:
"/INSTANCES//CURRENTSTATES//"
I use this code snippet to read:
accessor = new ZkBaseDataAccessor(zkClient);
String path = x;
ZNRecord record = accessor.get(path, null, AccessOption.PERSISTENT);
Immediat
node-1 back from offline to online.
>
> Not sure if this answers your question.
>
>
> Thanks
> Lei
>
>
> On Fri, Jun 3, 2016 at 2:55 PM, Neutron sharc
> wrote:
>
>> Hi the team,
>>
>> semi-auto mode supports a feature that, after a failed partic
Hi the team,
semi-auto mode supports a feature that, after a failed participant
comes back online, its owned replicas will be reused again (transit
from offline to slave etc). How can Helix recognize the replicas that
are owned by a participant after it reconnects after a failure?We
are tryin
25, 2016 at 7:41 PM, Neutron sharc
> wrote:
>
>> Hi Kishore, Kanak, any updates?
>>
>> On Thu, May 19, 2016 at 4:13 PM, kishore g wrote:
>> > Thanks Shawn. Will review it tonight. Kanak, It will be great if you can
>> > take a look at it as well.
>&
Hi Kishore, Kanak, any updates?
On Thu, May 19, 2016 at 4:13 PM, kishore g wrote:
> Thanks Shawn. Will review it tonight. Kanak, It will be great if you can
> take a look at it as well.
>
> On Thu, May 19, 2016 at 3:45 PM, Neutron sharc
> wrote:
>
>> Hi Helix team,
Hi Helix team,
I uploaded a PR to fix this bug: https://github.com/apache/helix/pull/44
Thanks.
On Wed, May 18, 2016 at 11:01 PM, Neutron sharc wrote:
> Hi Kanak,
>
> The same problem with zk helix lock re-appears. I found some clues
> about the potential bug. This potential bu
ng less than me node:
/shawn1/LOCKS/RESOURCE_Pool0/x-72233245264911662-79
=> T15 found T19 to be smallest so it waits for T19. Nobody will
wake up T19, so T15 is also blocked.
Any comments appreciated. Thanks.
-Neutronsharc
On Sat, May 14, 2016 at 5:20 PM, Neutron sharc wro
We increased the max connections allowed per client at zk server side.
The problem is gone now.
On Tue, May 10, 2016 at 2:50 PM, Neutron sharc wrote:
> Hi Kanak, thanks for reply.
>
> The problem is gone if we set a constraint of 1 on "STATE_TRANSITION"
> for the resource.
Hi Kanak, thanks for reply.
The problem is gone if we set a constraint of 1 on "STATE_TRANSITION"
for the resource. If we allow multiple state transitions to be
executed in the resource, then this zklock problem occurs.
btw, we run multiple participants in a same jvm in our test. In
other wo
Hi Helix team,
We observed an issue at state machine transition handle:
// statemodel.java:
public void offlineToSlave(Message message, NotificationContext context) {
// do work to start a local shard
// we want to save the new shard info to resource config
ZKHelixLock zklock = new ZKH
"REBALANCE_MODE" : "USER_DEFINED",
"REPLICAS" : "3",
"STATE_MODEL_DEF_REF" : "M1StateModel",
"STATE_MODEL_FACTORY_NAME" : "DEFAULT"
}
On Mon, May 2, 2016 at 12:49 PM, kishore g wrote:
> Can you paste the initial IS t
ARTITION
> }
> }
>
> This allows your logic to be idempotent and not depend on incremental
> changes.
>
> thanks,
> Kishore G
>
> On Thu, Apr 28, 2016 at 4:27 PM, Neutron sharc
> wrote:
>
>> Hi team,
>>
>> in USER_DEFINED rebalance mode, the ca
Hi team,
in USER_DEFINED rebalance mode, the callback computeResourceMapping()
accepts a “currentState”. Does this variable include replicas on a
dead participant ?
For example, my resource has a partition P1 master replica on
participant node1, a slave replica on participant node2. When node1
Here is my test state machine definition. Thanks for reviewing!
https://github.com/neutronsharc/tools/blob/master/HcdDiskStateModelFactory.java
On Mon, Apr 25, 2016 at 9:18 PM, kishore g wrote:
> Can you paste the state machine definition as well.
> On Apr 25, 2016 8:41 PM, "Ne
ROR 2016-04-25 20:13:48,594
org.apache.helix.messaging.handling.HelixStateTransitionHandler:385]
Skip internal error. errCode: ERROR, errMsg: Current state of
stateModel does not match the fromState in Message, Current
State:MASTER, message expected:SLAVE, partition: Pool0_1, from:
host1_admin, to
Hi Helix team,
I keep seeing this error from HelixStateTransitionHandler when the
state machine is running. It seems a partition's actual state doesn't
match with the state marked in controller message. What are the usual
causes? I'm using helix
0.7.1. Here is my maven pom.xml:
org.apach
22 matches
Mail list logo