Among two cases where I have seen this error messages I solved one.
On one cluster these dedicated interfaces were connected to a switch
instead of being connected directly.
Though I still don't know what caused these errors on another system
(the logs in the previous email).
The nodes are called node-0 and node-1.
It is not happening regularly. It rather happens occasionally.
Among about 50 two-node clusters we have in house I've seen this issue
in journal of 2 clusters.
I looked at logs and the pattern I see is this: stop Pacemaker and
Corosync on node-1, and then st
Hi folks,
We have a lot of our two-node systems running in our server room.
I noticed that some of them occasionally have this entries in the syslog:
Mar 15 12:54:45 A5-E4-151-bottom corosync[13766]: [TOTEM ] Digest does
not match
Mar 15 12:54:45 A5-E4-151-bottom corosync[13766]: [TOTEM ] Receive
Nice logo!
http://wiki.clusterlabgs.org/ doesn't load for me.
I also have a question which bothers me for a long time. Not a significant
one, but anyways ...
I have seen a lot "Linux-HA" name around. But it seems that the name it not
used anymore for this particular stack of HA software.
So I won
Thank you for the comprehensive answer. =)
Thank you,
Kostia
On Thu, Dec 1, 2016 at 5:56 PM, Ken Gaillot wrote:
> On 12/01/2016 06:04 AM, Kostiantyn Ponomarenko wrote:
> > OK, now I see. I still have a few questions.
> > 1. Is there a good reason to not remove the attribut
ot; to 0 before removing or changing
the attribute? Because now I see that previously set delay works when I
delete the attribute (--delete).
4. Does a delay set only one time work until it's unset (set to 0)?
Thank you,
Kostia
On Wed, Nov 30, 2016 at 10:39 PM, Ken Gaillot wrote:
> On 1
-up.
Thank you,
Kostia
On Wed, Nov 30, 2016 at 7:31 PM, Kostiantyn Ponomarenko <
konstantin.ponomare...@gmail.com> wrote:
> Hi Ken,
>
> I didn't look into the logs, but I experimented with it for a while.
> Here is what I found.
>
> It worked for you because this at
ue, Nov 29, 2016 at 1:08 AM, Ken Gaillot wrote:
> On 11/24/2016 05:24 AM, Kostiantyn Ponomarenko wrote:
> > Attribute dampening doesn't work for me also.
> > To test that I have a script:
> >
> > attrd_updater -N node-0 -n my-attr --update false --delay 20
The only thing that comes to my mind is that "standby" prevents all
resources from running on a node, whereas you can achieve the same with
"move" but it needs to be used for each resources. Also with "move" you
specify a node where you want a resource to be moved to.
On Nov 24, 2016 10:18 PM, "Om
something wrong?
Or maybe my understanding of an attribute dampening is not correct?
My Pacemaker version is 1.1.13. (heh, not the last one, but it is what it
is ...)
Thank you,
Kostia
On Wed, Nov 23, 2016 at 7:27 PM, Kostiantyn Ponomarenko <
konstantin.ponomare...@gmail.com> wrote:
> Ma
eboot
And this attribute is set to the live cluster configuration immediately.
What am I doing wrong?
Thank you,
Kostia
On Tue, Nov 22, 2016 at 11:33 PM, Kostiantyn Ponomarenko <
konstantin.ponomare...@gmail.com> wrote:
> Ken,
> Thank you for the explanation.
> I will try this lo
Ulrich,
I found your email from 2011 which explains how pending state can be
tracked =)
Looks like since than "crm_mon" also shows "Starting" and "Stopping"
statuses, if it is called with "--pending" argument (and "record-pending=true"
is set).
Here is how my script for checking will look like:
Ken,
Thank you for the explanation.
I will try this low-level way of shadow cib creation tomorrow.
PS: I will sleep much better with this excellent news/idea. =)
Thank you,
Kostia
On Tue, Nov 22, 2016 at 10:53 PM, Ken Gaillot wrote:
> On 11/22/2016 04:39 AM, Kostiantyn Ponomarenko wr
Hi folks,
I am looking for a good way of checking if a resource is in "starting"
state.
The thing is - I need to issue a command and I don't want to issue that
command when this particular resource is starting. This resource start can
take up to a few min.
As a note, I am OK with issuing that comm
I don't get how I can set this
timer.
Do I need to set this timer for each node?
Thank you,
Kostia
On Mon, Nov 21, 2016 at 9:30 AM, Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:
> >>> Ken Gaillot schrieb am 18.11.2016 um 16:17 in
> Nachricht
> :
> > O
Hi folks,
Is there a way to set a node attribute to the "status" section for few
nodes at the same time?
In my case there is a node attribute which allows some resources to start
in the cluster if it is set.
If I set this node attribute for say two nodes in a way - one and then
another, than thes
Phanidhar,
If you don't have any location rules in your cluster, you can try setting
"resource-stickiness=1" or "resource-stickiness=100".
That will do the same job as INFINITY if there is no other location rules
in the cluster.
Also, there is a way to see current state of scores in the cluster, w
ed constraints, but instead of duration
they are event based?
Thank you,
Kostia
On Thu, Nov 10, 2016 at 11:17 AM, Klaus Wenninger
wrote:
> On 11/10/2016 08:27 AM, Ulrich Windl wrote:
> >>>> Klaus Wenninger schrieb am 09.11.2016 um 17:42
> in
> > Nachricht <80c65
=)
Or maybe it can be a feature request =)
Thank you,
Kostia
On Wed, Nov 9, 2016 at 6:42 PM, Klaus Wenninger wrote:
> On 11/09/2016 05:30 PM, Kostiantyn Ponomarenko wrote:
> > When one problem seems to be solved, another one appears.
> > Now my script looks this way:
> >
Thank you for the answer, Kristoffer.
Thank you,
Kostia
On Sat, Nov 5, 2016 at 10:55 PM, Kristoffer Grönlund
wrote:
> Kostiantyn Ponomarenko writes:
>
> > Hi,
> >
> > I was reading about changing default resource stickiness based on time
> > rules, but I didn
you,
Kostia
On Tue, Nov 8, 2016 at 10:19 PM, Dejan Muhamedagic
wrote:
> On Tue, Nov 08, 2016 at 12:54:10PM +0100, Klaus Wenninger wrote:
> > On 11/08/2016 11:40 AM, Kostiantyn Ponomarenko wrote:
> > > Hi,
> > >
> > > I need a way to do a manual fail-back on d
Hi,
I need a way to do a manual fail-back on demand.
To be clear, I don't want it to be ON/OFF; I want it to be more like "one
shot".
So far I found that the most reasonable way to do it - is to set "resource
stickiness" to a different value, and then set it back to what it was.
To do that I creat
Hi,
I was reading about changing default resource stickiness based on time
rules, but I didn't find a way to set using crmsh. I tried en example
configuration from
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#_using_rules_to_control_cluster_options
using cibadmin
I faced with the same problem a few years ago - we needed to make a
two-node cluster working in a "split-brain" situation. We were looking at a
resource agent called SFEX which is disk based -
http://www.linux-ha.org/wiki/Sfex_(resource_agent) . At the end we rejected
SFEX because, if I am not mist
Yes, DBus would be one of the ways.
Thank you,
Kostia
On Mon, Sep 26, 2016 at 3:33 PM, Klaus Wenninger
wrote:
> On 09/26/2016 02:29 PM, Kostiantyn Ponomarenko wrote:
>
> Correcting a typo.
> * the same -> I also was hoping to hear that I can do the same from c++
> code.
Correcting a typo.
* the same -> I also was hoping to hear that I can do the same from c++
code.
Thank you,
Kostia
On Mon, Sep 26, 2016 at 3:28 PM, Kostiantyn Ponomarenko <
konstantin.ponomare...@gmail.com> wrote:
> Thanks for the answer.
>
> I also was hoping to hear that
Thanks for the answer.
I also was hoping to hear that I can do the case from c++ code.
Thank you,
Kostia
On Mon, Sep 26, 2016 at 1:59 PM, Klaus Wenninger
wrote:
> On 09/26/2016 12:29 PM, Kostiantyn Ponomarenko wrote:
> > Hi,
> >
> > I am wondering if it is possible t
Hi,
I am wondering if it is possible to sing up for cluster events from
Pacemaker? Something like:
- a node joins/leaves the cluster,
- a resource fails,
- a resources moves,
- etc.
Thank you,
Kostia
___
Users mailing list: Users@clusterlabs.org
htt
On Fri, Sep 23, 2016 at 10:06 PM, Ken Gaillot wrote:
> The risk with that configuration is that both nodes can start without
> seeing each other, and both start resources.
>
For that purpose I rely on STONITH and redundant links between the nodes.
Thank you,
Kostia
_
Sep 2, 2016 at 11:33 AM, Kristoffer Grönlund
wrote:
> Kostiantyn Ponomarenko writes:
>
> > Hi,
> >
> >>> If "scripts: no-quorum-policy=ignore" is becoming depreciated
> > Are there any plans to get rid of this option?
> > Am I missing something
Hi,
>> If "scripts: no-quorum-policy=ignore" is becoming depreciated
Are there any plans to get rid of this option?
Am I missing something?
PS: this option is very useful (vital) to me. And "two_node" option won't
replace it.
Thank you,
Kostia
On Thu, Sep 1, 2016 at 11:31 AM, Darren Thompson
w
Thank you.
Thank you,
Kostia
On Tue, Jul 19, 2016 at 5:10 PM, Ken Gaillot wrote:
> On 07/19/2016 07:02 AM, Kostiantyn Ponomarenko wrote:
> > Hi,
> >
> > If I set "failcount" manually it doesn't expire after "failure-timeout".
> > Is this
Hi,
If I set "failcount" manually it doesn't expire after "failure-timeout".
Is this behavior expected?
Thank you,
Kostia
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users
Project Home: http://www.clusterlabs.o
Hello,
I am seeing those error messages in the syslog when the machine goes down
(one-node cluster):
Jun 29 14:10:29 A4-4U24-303-LS systemd[1]: Stopped Pacemaker High
Availability Cluster Manager.
Jun 29 14:10:29 A4-4U24-303-LS crmd[4856]: error: lrm_state_verify_stopped:
3 resources were active
Hi guys,
My understanding is that sometimes an unused command or/and option to a
command can be removed. I don't know how many people use "crmadmin
−−dc_lookup" command, but I do use it =) .
That is why I ask not to remove this in the future releases of Pacemaker,
because it is a vital command for
Thank you, Ken.
This helps a lot.
Now I am sure that my current approach fits best for me =)
Thank you,
Kostia
On Wed, Mar 30, 2016 at 11:10 PM, Ken Gaillot wrote:
> On 03/29/2016 08:22 AM, Kostiantyn Ponomarenko wrote:
> > Ken, thank you for the answer.
> >
> > Every nod
is state the
best it can.
Thank you,
Kostia
On Mon, Mar 14, 2016 at 7:18 PM, Ken Gaillot wrote:
> On 02/29/2016 07:00 AM, Kostiantyn Ponomarenko wrote:
> > I am back to this question =)
> >
> > I am still trying to understand the impact of "High CPU load detected"
&g
t 12:17 AM, Andrew Beekhof wrote:
>
> > On 27 May 2015, at 10:09 pm, Kostiantyn Ponomarenko <
> konstantin.ponomare...@gmail.com> wrote:
> >
> > I think I wasn't precise in my questions.
> > So I will try to ask more precise questions.
> > 1. why the
>>
>> For some reasons Corosync started to experience a lack of processor time
>> (scheduling).
>> That is why monitor operations started to time out.
>> Than after "Process pause detected for ..." message I assume the node
>> should be STONITHe
ventually "stop" function times out for one of the resources, that is why
Pacemaker eventually shuts down.
Please correct me in case I am wrong anywhere in my assumptions.
Thank you for spending your precious time reading all this =)
Hope for some help here =)
Thank you,
Kostia
On Wed
you,
Kostia
On Wed, Feb 17, 2016 at 5:02 PM, Greg Woods wrote:
>
> On Wed, Feb 17, 2016 at 3:30 AM, Kostiantyn Ponomarenko <
> konstantin.ponomare...@gmail.com> wrote:
>
>> Jan 29 07:00:43 B5-2U-205-LS corosync[2742]: [MAIN ] Corosync main
>> process was not schedule
Hi,
I am seeing massages like this in my logs:
Jan 29 07:00:41 B5-2U-205-LS lrmd[3012]: notice: operation_finished:
diskManager_monitor_3:18807:stderr [ Failed to get properties:
Connection timed out ]
Jan 29 07:00:41 B5-2U-205-LS lrmd[3012]: notice: operation_finished:
pmdh_monitor_3:188
der and here I don't understand why
Pacemaker considers it failed?
Thank you,
Kostia
On Tue, Jan 19, 2016 at 8:02 PM, Ken Gaillot wrote:
> On 01/19/2016 10:30 AM, Kostiantyn Ponomarenko wrote:
> > The resource that wasn't running, but was reported as running, is
> > "
[monitor] : got rc=$rc"
return $OCF_NOT_RUNNING
}
Thank you,
Kostia
On Tue, Jan 19, 2016 at 6:30 PM, Kostiantyn Ponomarenko <
konstantin.ponomare...@gmail.com> wrote:
> The resource that wasn't running, but was reported as running, is
> "adminServer".
>
> Here ar
The Pacemaker's version is 1.1.13.
Thank you,
Kostia
On Tue, Jan 19, 2016 at 2:49 PM, Kostiantyn Ponomarenko <
konstantin.ponomare...@gmail.com> wrote:
> One of resources in my cluster is not actually running, but "crm_mon"
> shows it with the "Started" s
One of resources in my cluster is not actually running, but "crm_mon" shows
it with the "Started" status.
Its resource agent's monitor function returns "$OCF_NOT_RUNNING", but
Pacemaker doesn't react on this anyhow - crm_mon show the resource as
Started.
I couldn't find an explanation to this behav
Hi,
What is the difference between "OCF_ERR_GENERIC" and "OCF_NOT_RUNNING"
return codes in "monitor" action from the Pacemaker's point of view?
I was looking here
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
, but I still don't see the difference
The issue doesn't hurt me right now as we use a workaround for that.
But a workaround is not a fix of the problem.
Thank you,
Kostya
On Fri, Aug 28, 2015 at 12:25 PM, Kostiantyn Ponomarenko <
konstantin.ponomare...@gmail.com> wrote:
> In my case the final solution will be shippe
, Andrew Beekhof wrote:
>
> > On 21 Aug 2015, at 11:06 pm, Kostiantyn Ponomarenko <
> konstantin.ponomare...@gmail.com> wrote:
> >
> > As I wrote in the previous email, it could happen when NTP servers are
> unreachable before Pacemaker's start, and then, after so
I agree that the possibility of this to happen is really really small =)
But the consequences can be huge =(
Thank you,
Kostya
On Fri, Aug 21, 2015 at 4:06 PM, Kostiantyn Ponomarenko <
konstantin.ponomare...@gmail.com> wrote:
> As I wrote in the previous email, it could happen when NT
nd in that case the bug will appear itself.
Thank you,
Kostya
On Mon, Aug 17, 2015 at 3:01 AM, Andrew Beekhof wrote:
>
> > On 8 Aug 2015, at 12:43 am, Kostiantyn Ponomarenko <
> konstantin.ponomare...@gmail.com> wrote:
> >
> > Hi Andrew,
> >
> > So the i
way to "self-stonithing"?
Will it be sufficient to create another stonith agent with will issue
"reboot -f"?
Thank you,
Kostya
On Mon, Aug 17, 2015 at 1:15 AM, Andrew Beekhof wrote:
>
> > On 13 Aug 2015, at 9:39 pm, Kostiantyn Ponomarenko <
> konstantin.ponoma
Thank you for the help :-)
On Aug 13, 2015 20:19, "Digimer" wrote:
> Ah, yes. If it's a RHEL/CentOS machine, put it in /usr/sbin/. If it's
> another OS, locate fence_ipmilan and put your agent in the same directory.
>
> digimer
>
> On 13/08/15 01:03 PM, Kost
here other places which I also can put my
agent in and get it visible to the cluster?
Thank you,
Kostya
On Thu, Aug 13, 2015 at 5:34 PM, Digimer wrote:
> On 13/08/15 07:54 AM, Kostiantyn Ponomarenko wrote:
> > Digimer,
> >
> > Thank you. I will try this out.
> > One more
> Then make sure it can be stonithd. Add additional stonith agent using
> independent communication channel.
Not possible. Only one node up and running in the cluster and I am
wondering - can it STONITH itself? Because most likely, after reboot, the
problem can be gone.
> I have no idea what fenc
Digimer,
Thank you. I will try this out.
One more question. What about directories for those agents, what rules are
here?
Thank you,
Kostya
On Tue, Aug 11, 2015 at 6:21 PM, Digimer wrote:
> On 11/08/15 11:17 AM, Kostiantyn Ponomarenko wrote:
> > Hi guys,
> >
> > Is th
Hi,
I noticed that after moving to the new mailing list there is no more
updates here:
http://www.gossamer-threads.com/lists/linuxha/users/
Can it be fixed or am I missing something? I was a convenient way of
searching/reading/tracking issues.
Thank you,
Kostya
__
Hi,
Brief description of the STONITH problem:
I see two different behaviors with two different STONITH configurations. If
Pacemaker cannot find a device that can STONITH a problematic node, the
node remains up and running. Which is bad, because it must be STONITHed.
As opposite to it, if Pacemake
Hi guys,
Is there any documentation which describes implementation of fence and
STONITH agents like those ones for Resource Agents?:
http://www.linux-ha.org/wiki/OCF_Resource_Agents
http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html
I am particular interested in the arguments which
Hi Marek,
The agent I wrote is too much specific for me.
There is no use outside of it.
And it basically as simple as Resource Agent.
Thank you,
Kostya
On Wed, Mar 18, 2015 at 5:45 PM, Marek "marx" Grac wrote:
> Hi,
>
>
> On 03/11/2015 10:39 AM, Kostiantyn Ponomarenko wr
orward).
So, then, after NTP becomes reachable, the bug appears.
Thank you,
Kostya
On Mon, Aug 10, 2015 at 9:13 AM, Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:
> >>> Kostiantyn Ponomarenko schrieb am
> 07.08.2015
> um 16:43 in Nachricht
> :
>
I've created a bug http://bugs.clusterlabs.org/show_bug.cgi?id=5246
Thank you,
Kostya
On Fri, Aug 7, 2015 at 5:43 PM, Kostiantyn Ponomarenko <
konstantin.ponomare...@gmail.com> wrote:
> Hi Andrew,
>
> So the issue is:
>
> Having one node up and running, set time on
rce actually remains "stopped".
Do you need more input from me on the issue?
Thank you,
Kostya
On Wed, Aug 5, 2015 at 3:01 AM, Andrew Beekhof wrote:
>
> > On 4 Aug 2015, at 7:31 pm, Kostiantyn Ponomarenko <
> konstantin.ponomare...@gmail.com> wrote:
> >
> >
Thank you,
Kostya
On Tue, Aug 4, 2015 at 3:53 AM, Andrew Beekhof wrote:
>
> > On 4 Aug 2015, at 1:48 am, Kostiantyn Ponomarenko <
> konstantin.ponomare...@gmail.com> wrote:
> >
> > Hi folks,
> >
> > Is it possible to have a configured initial cib.xml file
On Tue, Aug 4, 2015 at 3:57 AM, Andrew Beekhof wrote:
> Github might be another.
I am not able to open an issue/bug here
https://github.com/ClusterLabs/pacemaker
Thank you,
Kostya
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/m
Hi folks,
Is it possible to have a configured initial cib.xml file?
In my solution there is a script that applies configuration automatically
each boot.
But before that, another script (before starting pacemaker) does:
# rm -rf /var/lib/pacemaker/cib/*
in order to clean up the previous config
pache server has gone down.Do i need to change
> any of my scripts? I want to make sure that a single command to start an
> apache service in one node should also start the apache servers running on
> other nodes.
>
> On Wed, Jul 29, 2015 at 7:12 PM, Kostiantyn Ponomarenko <
>
care of your
> instances.
>
> In SLES11SP3 I only found those:
> # find /usr/lib/ocf/ -iname apa\*
> /usr/lib/ocf/lib/heartbeat/apache-conf.sh
> /usr/lib/ocf/resource.d/heartbeat/apache
>
> Regards,
> Ulrich
>
> >
> > On Wed, Jul 29, 2015 at 6:14 PM, Kostiant
5 at 9:09 AM, Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:
> >>> Kostiantyn Ponomarenko schrieb am
> 24.07.2015
> um 16:53 in Nachricht
> :
> > On Fri, Jul 24, 2015 at 1:21 PM, Ulrich Windl <
> > ulrich.wi...@rz.uni-regensburg.de> wrote:
>
On Fri, Jul 24, 2015 at 1:21 PM, Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:
> 25 years backwards?
I've tried to set time back for:
2 hours;
10 min.
The result was the same as with:
# date --set="1990-01-01 01:00:00"
Setting time to 5 min back doesn't lead to that issue
8:56 AM, Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:
> >>> Kostiantyn Ponomarenko schrieb am
> 23.07.2015
> um 18:09 in Nachricht
> :
> > Hi,
> >
> > If you do:
> > # date --set="1990-01-01 01:00:00"
>
> Why
Hi,
If you do:
# date --set="1990-01-01 01:00:00"
when only one node is present in the cluster and while the cluster is
working, and then stop a resource (any resource), the cluster fails the
resource once, shows it as Started, but the resource actually is still
stopped.
Is it the expected be
72 matches
Mail list logo