>
> P.S. If the issue is just a matter of timing when you're starting both
> nodes, you can start corosync on both nodes first, then start pacemaker
> on both nodes. That way pacemaker on each node will immediately see the
> other node's presence.
> --
Well rebooting a server lasts 2 minutes a
Hi
>
> Ah, this rings a bell. Despite having fenced the node, the cluster
> still considers the node unseen. That was a regression in 1.1.14 that
> was fixed in 1.1.15. :-(
>
Oh :( I'm using Pacemaker-1.1.14.
Do you know if this reboot retries are just run 3 times? All the tests I've
done
Hi
>
> The first fencing is legitimate -- the node hasn't been seen at start-
> up, and so needs to be fenced. The second fencing will be the one of
> interest. Also, look for the result of the first fencing.
The first fencing has finished with OK, as well as the other two fencing
operations.
Hi
>
>
> Do you mean you have a custom fencing agent configured? If so, check
> the return value of each attempt. Pacemaker should request fencing only
> once as long as it succeeds (returns 0), but if the agent fails
> (returns nonzero or times out), it will retry, even if the reboot
> worked i
Hi
I have a two-node corosync+pacemaker which, starting only one node, it fences
the other node. It's ok as the default behaviour as the default
"startup-fencing" is set to true.
But, the other node is rebooted 3 times, and then, the remaining node starts
resources and doesn't fence the node an
> El 17 jul 2017, a las 8:02, Ulrich Windl
> escribió:
>
> Hi!
>
> Could this mean the stonith-timeout is signioficantly larger than the time
> for a complete reboot? So the fenced node would be up again when the cluster
> thinks the fencing has just completed.
>
> Regards,
> Ulrich
> P.S
>
>
> So if this is really the reason it would probably be worth
> finding out what is really happening.
>
Thanks. Yes, I think this is really the reason. I fixed it one week ago and
hasn't happened again
___
Users mailing list: Users@clusterlabs.o
> El 6 jul 2017, a las 17:34, Ken Gaillot escribió:
>
> On 07/06/2017 10:27 AM, Cesar Hernandez wrote:
>>
>>>
>>> It looks like a bug when the fenced node rejoins quickly enough that it
>>> is a member again before its fencing confirmation has been
>>>
>>
>> Could it be caused if node 2 becomes rebooted and alive before the stonith
>> script has finished?
>
> That *shouldn't* cause any problems, but I'm not sure what's happening
> in this case.
Maybe is the cause for it... My other servers installations had a slow stonith
device and als
>
> If node2 is getting the notification of its own fencing, it wasn't
> successfully fenced. Successful fencing would render it incapacitated
> (powered down, or at least cut off from the network and any shared
> resources).
Maybe I don't understand you, or maybe you don't understand me... ;)
>
> I don't have answers, but questions:
> Assuming node1 was DC when stopped: Will ist CIB still record it as DC after
> being stopped?
> Obviously node1 cannot know about any changes node2 did. And node1 when
> started will find that node2 is unexpectedly down, so it will fence it to be
> su
>
> AFAIK that's not proper fencing. SunOS once had a "fasthalt" command. In
> Linux "halt -nf" might do a similar thing, or maybe trigger a reboot via
> sysrq (echo b > /proc/sysrq-trigger).
>
> Fencing is everything but a clean shutdown. The specific problem is that
> shutdown may be perfor
>>
>>
>> Thanks. But I think is not a good idea to disable startup fencing: I have
>> shared disks (drbd) and stonith is very important in this scenario
>
> AFAIK. DRBD is not considered to be a shared disk; it's a replicated disk at
> best.
>
Of course I know it. Only 1 of the nodes can use
>
>
>>>
>>> But you definitely shouldn't have a fencing-agent that claims to have fenced
>>> a node if it is not sure - rather the other way round if in doubt.
>>
>>
>
> True! Which is why I mentioned it to be dangerous.
> But your fencing-agent is even more dangerous ;-)
>
>
Well.. my sta
> Not a good idea probably - and the reason for what you are experiencing ;-)
> If you have problems starting the nodes within a certain time-window
> disabling startup-fencing might be an option to consider although dangerous.
> But you definitely shouldn't have a fencing-agent that claims to hav
> Are you logging which ones went OK and which failed.
> The script returns negatively if both go wrong?
The script always returns OK
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users
Project Home: http
> Might be kind of a strange race as well ... but without knowing what the
> script actually does ...
>
The script first try to reboot the node using ssh, something like ssh $NODE
reboot -f, then runs a remote reboot using AWS api
Thanks
___
Users
> The first line is the consequence of the 2nd.
> And the 1st says that node2 just has seen some fencing-resource
> positively reporting to have fenced himself - which
> is why crmd is exiting in a way that it is not respawned
> by pacemakerd.
Thanks. But my script have a logfile, I've checked it
>
> Agreed, I don't think it's multicast vs unicast.
>
> I can't see from this what's going wrong. Possibly node1 is trying to
> re-fence node2 when it comes back. Check that the fencing resources are
> configured correctly, and check whether node1 sees the first fencing
> succeed.
Thanks. Che
Hi
I have installed a pacemaker cluster with two nodes. The same type of
installation has done before many times and the following error never appeared
before. The situation is the following:
both nodes running cluster services
stop pacemaker&corosync on node 1
stop pacemaker&corosync on node 2
20 matches
Mail list logo