from:"Valentin Vidic"

Re: [ClusterLabs] Corosync crash

2019-05-07 Thread Valentin Vidic

On Tue, May 07, 2019 at 09:59:03AM +0300, Klecho wrote: > During the weekend my corosync daemon suddenly died without anything in the > logs, except this: > > May 5 20:39:16 ZZZ kernel: [1605277.136049] traps: corosync[2811] trap > invalid opcode ip:5635c376f2eb sp:7ffc3e109950 error:0 in > coros

Re: [ClusterLabs] Antw: Re: Issue with DB2 HADR cluster

2019-04-03 Thread Valentin Vidic

On Wed, Apr 03, 2019 at 10:36:52AM +0300, Andrei Borzenkov wrote: > I assume this is path failover time? As I doubt storage latency can be > that high? > > I wonder, does IBM have official guidelines for integrating SBD with > their storage? Otherwise where this requirement comes from? Yes, we ha

Re: [ClusterLabs] Antw: Re: Issue with DB2 HADR cluster

2019-04-03 Thread Valentin Vidic

On Wed, Apr 03, 2019 at 09:13:58AM +0200, Ulrich Windl wrote: > I'm surprised: Once sbd writes the fence command, it usually takes > less than 3 seconds until the victim is dead. If you power off a > server, the PDU still may have one or two seconds "power reserve", so > the host may not be down im

Re: [ClusterLabs] Antw: Why do clusters have a name?

2019-03-27 Thread Valentin Vidic

On Wed, Mar 27, 2019 at 08:13:07AM +0100, Ulrich Windl wrote: > Seems to be traditional. Maybe it's just to create some namespace. Yes, I think GFS2 and OCFS2 store cluster name in FS metadata. If the cluster name changes they might not mount anymore: https://access.redhat.com/solutions/18430 --

Re: [ClusterLabs] Antw: Re: Question on sharing data with DRDB

2019-03-21 Thread Valentin Vidic

On Thu, Mar 21, 2019 at 08:00:05AM +0100, Ulrich Windl wrote: > Actually it makes no difference to a non-clustered local disk: If the buffers > are not flushed, data can get lost if there is a power failure. If you use > sync > writes, the data should be on disk, and I guess with DRBD the data sho

Re: [ClusterLabs] Question on sharing data with DRDB

2019-03-20 Thread Valentin Vidic

On Wed, Mar 20, 2019 at 07:31:02PM +0100, Valentin Vidic wrote: > Right, but I'm not sure how this would help in the above situation > unless the DRBD can undo the local write that did not succeed on the > peer? Ah, it seems the activity log handles the undo by storing the location

Re: [ClusterLabs] Question on sharing data with DRDB

2019-03-20 Thread Valentin Vidic

On Wed, Mar 20, 2019 at 02:01:07PM -0400, Digimer wrote: > On 2019-03-20 2:00 p.m., Valentin Vidic wrote: > > On Wed, Mar 20, 2019 at 01:47:56PM -0400, Digimer wrote: > >> Not when DRBD is configured correctly. You sent 'fencing > >> resource-and-stonith;'

Re: [ClusterLabs] Question on sharing data with DRDB

2019-03-20 Thread Valentin Vidic

On Wed, Mar 20, 2019 at 01:47:56PM -0400, Digimer wrote: > Not when DRBD is configured correctly. You sent 'fencing > resource-and-stonith;' and set the appropriate fence handler. This tells > DRBD to not proceed with a write while a node is in an unknown state > (which happens when the node stops

Re: [ClusterLabs] Question on sharing data with DRDB

2019-03-20 Thread Valentin Vidic

On Wed, Mar 20, 2019 at 01:44:06PM -0400, Digimer wrote: > GFS2 notified the peers of disk changes, and DRBD handles actually > copying to changes to the peer. > > Think of DRBD, in this context, as being mdadm RAID, like how writing to > /dev/md0 is handled behind the scenes to write to both /dev

Re: [ClusterLabs] Question on sharing data with DRDB

2019-03-20 Thread Valentin Vidic

On Wed, Mar 20, 2019 at 01:34:52PM -0400, Digimer wrote: > Depending on your fail-over tolerances, I might add NFS to the mix and > have the NFS server run on one node or the other, exporting your ext4 FS > that sits on DRBD in single-primary mode. > > The failover (if the NFS host died) would loo

Re: [ClusterLabs] Question on sharing data with DRDB

2019-03-20 Thread Valentin Vidic

On Wed, Mar 20, 2019 at 12:37:21PM -0400, Digimer wrote: > Cluster filesystems are amazing if you need them, and to be avoided if > at all possible. The overhead from the cluster locking hurts performance > quite a lot, and adds a non-trivial layer of complexity. > > I say this as someone who

Re: [ClusterLabs] Question on sharing data with DRDB

2019-03-20 Thread Valentin Vidic

On Wed, Mar 20, 2019 at 09:36:58AM -0600, JCA wrote: > # pcs -f fs_cfg resource create TestFS Filesystem device="/dev/drbd1" > directory="/tmp/Testing" > fstype="ext4" ext4 can only be mounted on one node at a time. If you need to access files on both nodes at the same time than a clu

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic

On Sat, Feb 16, 2019 at 10:23:17PM +, Eric Robinson wrote: > I'm looking through the docs but I don't see how to set the on-fail value for > a resource. It is not set on the resource itself but on each of the actions (monitor, start, stop). -- Valentin ___

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic

On Sat, Feb 16, 2019 at 09:33:42PM +, Eric Robinson wrote: > I just noticed that. I also noticed that the lsb init script has a > hard-coded stop timeout of 30 seconds. So if the init script waits > longer than the cluster resource timeout of 15s, that would cause the Yes, you should use highe

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic

On Sat, Feb 16, 2019 at 09:03:43PM +, Eric Robinson wrote: > Here are the relevant corosync logs. > > It appears that the stop action for resource p_mysql_002 failed, and > that caused a cascading series of service changes. However, I don't > understand why, since no other resources are depend

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic

On Sat, Feb 16, 2019 at 08:50:57PM +, Eric Robinson wrote: > Which logs? You mean /var/log/cluster/corosync.log? On the DC node pacemaker will be logging the actions it is trying to run (start or stop some resources). > But even if the stop action is resulting in an error, why would the > clu

Re: [ClusterLabs] Why Do All The Services Go Down When Just One Fails?

2019-02-16 Thread Valentin Vidic

On Sat, Feb 16, 2019 at 08:34:21PM +, Eric Robinson wrote: > Why is it that when one of the resources that start with p_mysql_* > goes into a FAILED state, all the other MySQL services also stop? Perhaps stop is not working correctly for these lsb services, so for example stopping lsb:mysql_00

Re: [ClusterLabs] Announcing hawk-apiserver, now in ClusterLabs

2019-02-12 Thread Valentin Vidic

On Tue, Feb 12, 2019 at 08:00:38PM +0100, Kristoffer Grönlund wrote: > One final note: hawk-apiserver uses a project called go-pacemaker > located at https://github.com/krig/go-pacemaker. I indend to transfer > this to ClusterLabs as well. go-pacemaker is still somewhat rough around > the edges, an

Re: [ClusterLabs] Trying to Understanding crm-fence-peer.sh

2019-01-16 Thread Valentin Vidic

On Wed, Jan 16, 2019 at 04:20:03PM +0100, Valentin Vidic wrote: > I think drbd always calls crm-fence-peer.sh when it becomes disconnected > primary. In this case storage1 has closed the DRBD connection and > storage2 has become a disconnected primary. > > Maybe the problem is the

Re: [ClusterLabs] Trying to Understanding crm-fence-peer.sh

2019-01-16 Thread Valentin Vidic

On Wed, Jan 16, 2019 at 09:03:21AM -0600, Bryan K. Walton wrote: > The exit code 4 would seem to suggest that storage1 should be fenced. > But the switch ports connected to storage1 are still enabled. > > Am I misreading the logs here? This is a clean reboot, maybe fencing > isn't supposed to hap

Re: [ClusterLabs] Unexpected resource restart

2019-01-16 Thread Valentin Vidic

On Wed, Jan 16, 2019 at 12:41:11PM +0100, Valentin Vidic wrote: > This is what pacemaker says about the resource restarts: > > Jan 16 11:19:08 node1 pacemaker-schedulerd[713]: notice: * Start dlm:1 > ( node2 ) > Jan 16 11:19:08 node1 pacemaker-scheduler

Re: [ClusterLabs] Unexpected resource restart

2019-01-16 Thread Valentin Vidic

On Wed, Jan 16, 2019 at 12:16:04PM +, Andrew Price wrote: > The only thing that stands out to me with this config is the lack of > ordering constraint between dlm and lvmlockd. Not sure if that's the issue > though. They are both in the storage group so the order should be dlm than lockd?

Re: [ClusterLabs] Unexpected resource restart

2019-01-16 Thread Valentin Vidic

On Wed, Jan 16, 2019 at 12:28:59PM +0100, Valentin Vidic wrote: > When node2 is set to standby resource stop running there. However when > node2 is brought back online, it causes the resources on node1 to stop > and than start again which is a bit unexpected? > > Maybe the depende

[ClusterLabs] Unexpected resource restart

2019-01-16 Thread Valentin Vidic

Hi all, I'm testing the following configuration with two nodes: Clone: storage-clone Meta Attrs: interleave=true target-role=Started Group: storage Resource: dlm (class=ocf provider=pacemaker type=controld) Resource: lockd (class=ocf provider=heartbeat type=lvmlockd) Clone: gfs2-clon

Re: [ClusterLabs] Status of Pacemaker 2 support in SBD?

2019-01-11 Thread Valentin Vidic

On Fri, Jan 11, 2019 at 12:42:02PM +0100, wf...@niif.hu wrote: > I opened https://github.com/ClusterLabs/sbd/pull/62 with our current > patches, but I'm just a middle man here. Valentin, do you agree to > upstream these two remaining patches of yours? Sure thing, merge anything you can... -- Va

Re: [ClusterLabs] Upgrading from CentOS 6 to CentOS 7

2019-01-03 Thread Valentin Vidic

On Thu, Jan 03, 2019 at 04:56:26PM -0600, Ken Gaillot wrote: > Right -- not only that, but corosync 1 (CentOS 6) and corosync 2 > (CentOS 7) are not compatible for running in the same cluster. I suppose it is the same situation for upgrading from corosync 2 to corosync 3? -- Valentin ___

Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Valentin Vidic

On Tue, Nov 13, 2018 at 11:01:46AM -0600, Ken Gaillot wrote: > Clone instances have a default stickiness of 1 (instead of the usual 0) > so that they aren't needlessly shuffled around nodes every transition. > You can temporarily set an explicit stickiness of 0 to let them > rebalance, then unset i

Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Valentin Vidic

On Tue, Nov 13, 2018 at 05:04:19PM +0100, Valentin Vidic wrote: > Also it seems to require multicast, so better check for that too :) And while the CLUSTERIP resource seems to work for me in a test cluster, the following clone definition: clone cip-clone cip \ meta clone-max=2 cl

Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Valentin Vidic

On Tue, Nov 13, 2018 at 04:06:34PM +0100, Valentin Vidic wrote: > Could be some kind of ARP inspection going on in the networking equipment, > so check switch logs if you have access to that. Also it seems to require multicast, so better check for that too :) -- Va

Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-11-13 Thread Valentin Vidic

On Tue, Nov 13, 2018 at 09:06:56AM -0500, Daniel Ragle wrote: > Thanks, finally getting back to this. Putting a tshark on both nodes and > then restarting the VIP-clone resource shows the pings coming through for 12 > seconds, always on node2, then stop. I.E., before/after those 12 seconds > nothin

Re: [ClusterLabs] [ClusterLabs Developers] resource-agents v4.2.0 rc1

2018-10-19 Thread Valentin Vidic

On Fri, Oct 19, 2018 at 11:09:34AM +0200, Kristoffer Grönlund wrote: > I wonder if perhaps there was a configuration change as well, since the > return code seems to be configuration related. Maybe something changed > in the build scripts that moved something around? Wild guess, but... Seems to be

Re: [ClusterLabs] resource-agents v4.2.0 rc1

2018-10-18 Thread Valentin Vidic

On Wed, Oct 17, 2018 at 12:03:18PM +0200, Oyvind Albrigtsen wrote: > - apache: retry PID check. I noticed that the ocft test started failing for apache in this version. Not sure if the test is broken or the agent. Can you check if the test still works for you? Restoring the previous version of th

Re: [ClusterLabs] LIO iSCSI target fails to start

2018-10-11 Thread Valentin Vidic

On Wed, Oct 10, 2018 at 02:36:21PM +0200, Stefan K wrote: > I think my config is correct, but it sill fails with "This Target > already exists in configFS" but "targetcli ls" shows nothing. It seems to find something in /sys/kernel/config/target. Maybe it was setup outside of pacemaker somehow?

Re: [ClusterLabs] IPaddr2 works for 12 seconds then stops

2018-10-11 Thread Valentin Vidic

On Thu, Oct 11, 2018 at 01:25:52PM -0400, Daniel Ragle wrote: > For the 12 second window it *does* work in, it appears as though it works > only on one of the two servers (and always the same one). My twelve seconds > of pings runs continuously then stops; while attempts to hit the Web server > wor

Re: [ClusterLabs] [ClusterLabs Developers] fence-agents v4.3.0

2018-10-09 Thread Valentin Vidic

On Tue, Oct 09, 2018 at 12:07:38PM +0200, Oyvind Albrigtsen wrote: > I've created a PR for the library detection and try/except imports: > https://github.com/ClusterLabs/fence-agents/pull/242 Thanks, I will give it a try right away... -- Valentin ___ U

Re: [ClusterLabs] [ClusterLabs Developers] fence-agents v4.3.0

2018-10-09 Thread Valentin Vidic

On Tue, Oct 09, 2018 at 10:55:08AM +0200, Oyvind Albrigtsen wrote: > It seems like the if-line should be updated to check for those 2 > libraries (from the imports in the agent). Yes, that might work too. Also would it be possible to make the imports in openstack agent conditional so the metadata

Re: [ClusterLabs] [ClusterLabs Developers] fence-agents v4.3.0

2018-10-09 Thread Valentin Vidic

On Tue, Oct 02, 2018 at 03:13:51PM +0200, Oyvind Albrigtsen wrote: > ClusterLabs is happy to announce fence-agents v4.3.0. > > The source code is available at: > https://github.com/ClusterLabs/fence-agents/releases/tag/v4.3.0 > > The most significant enhancements in this release are: > - new fenc

Re: [ClusterLabs] Position of pacemaker in today's HA world

2018-10-05 Thread Valentin Vidic

On Fri, Oct 05, 2018 at 11:34:10AM -0500, Ken Gaillot wrote: > The next big challenge is that high availability is becoming a subset > of the "orchestration" space in terms of how we fit into IT > departments. Systemd and Kubernetes are the clear leaders in service > orchestration today and likely

Re: [ClusterLabs] About fencing stonith

2018-09-26 Thread Valentin Vidic

On Thu, Sep 06, 2018 at 04:47:32PM -0400, Digimer wrote: > It depends on the hardware you have available. In your case, RPi has no > IPMI or similar feature, so you'll need something external, like a > switched PDU. I like the APC AP7900 (or your countries variant), which > you can often get used f

Re: [ClusterLabs] 2 node cluster dlm/clvm trouble

2018-09-11 Thread Valentin Vidic

On Tue, Sep 11, 2018 at 09:31:13AM -0400, Patrick Whitney wrote: > But, when I invoke the "human" stonith power device (i.e. I turn the node > off), the other node collapses... > > In the logs I supplied, I basically do this: > > 1. stonith fence (With fence scsi) After fence_scsi finishes the n

Re: [ClusterLabs] 2 node cluster dlm/clvm trouble

2018-09-11 Thread Valentin Vidic

On Tue, Sep 11, 2018 at 04:14:08PM +0300, Vladislav Bogdanov wrote: > And that is not an easy task sometimes, because main part of dlm runs in > kernel. > In some circumstances the only option is to forcibly reset the node. Exactly, killing the power on the node will stop the DLM code running in t

Re: [ClusterLabs] 2 node cluster dlm/clvm trouble

2018-09-11 Thread Valentin Vidic

On Tue, Sep 11, 2018 at 09:13:08AM -0400, Patrick Whitney wrote: > So when the cluster suggests that DLM is shutdown on coro-test-1: > Clone Set: dlm-clone [dlm] > Started: [ coro-test-2 ] > Stopped: [ coro-test-1 ] > > ... DLM isn't actually stopped on 1? If you can connect to the node

Re: [ClusterLabs] 2 node cluster dlm/clvm trouble

2018-09-11 Thread Valentin Vidic

On Tue, Sep 11, 2018 at 09:02:06AM -0400, Patrick Whitney wrote: > What I'm having trouble understanding is why dlm flattens the remaining > "running" node when the already fenced node is shutdown... I'm having > trouble understanding how power fencing would cause dlm to behave any > differently t

Re: [ClusterLabs] Antw: Re: pcsd processes using 100% CPU

2018-07-23 Thread Valentin Vidic

On Thu, May 24, 2018 at 12:16:16AM -0600, Casey & Gina wrote: > Tried that, it doesn't seem to do anything but prefix the lines with the pid: > > [pid 24923] sched_yield() = 0 > [pid 24923] sched_yield() = 0 > [pid 24923] sched_yield() = 0 We managed to t

Re: [ClusterLabs] Problem with pacemaker init.d script

2018-07-11 Thread Valentin Vidic

On Wed, Jul 11, 2018 at 04:31:31PM -0600, Casey & Gina wrote: > Forgive me for interjecting, but how did you upgrade on Ubuntu? I'm > frustrated with limitations in 1.1.14 (particularly in PCS so not sure > if it's relevant), and Ubuntu is ignoring my bug reports, so it would > be great to upgrade

Re: [ClusterLabs] Problem with pacemaker init.d script

2018-07-11 Thread Valentin Vidic

On Wed, Jul 11, 2018 at 08:01:46PM +0200, Salvatore D'angelo wrote: > Yes, but doing what you suggested the system find that sysV is > installed and try to leverage on update-rc.d scripts and the failure > occurs: > > root@pg1:~# systemctl enable corosync > corosync.service is not a native service

Re: [ClusterLabs] chap lio-t / iscsitarget disabled - why?

2018-04-03 Thread Valentin Vidic

On Tue, Apr 03, 2018 at 04:48:00PM +0200, Stefan Friedel wrote: > we've a running drbd - iscsi cluster (two nodes Debian stretch, pacemaker / > corosync, res group w/ ip + iscsitarget/lio-t + iscsiluns + lvm etc. on top of > drbd etc.). Everything is running fine - but we didn't manage to get CHAP

Re: [ClusterLabs] False negative from kamailio resource agent

2018-03-23 Thread Valentin Vidic

On Thu, Mar 22, 2018 at 03:36:55PM -0400, Alberto Mijares wrote: > Straight to the question: how can I manually run a resource agent > script (kamailio) simulating the pacemaker's environment without > actually having pacemaker running? You should be able to run it with something like: # OCF_ROOT

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Valentin Vidic

On Mon, Mar 12, 2018 at 04:31:46PM +0100, Klaus Wenninger wrote: > Nope. Whenever the cluster is completely down... > Otherwise nodes would come up - if not seeing each other - > happily with both starting all services because they don't > know what already had been running on the other node. > Tec

Re: [ClusterLabs] single node fails to start the ocfs2 resource

2018-03-12 Thread Valentin Vidic

On Mon, Mar 12, 2018 at 01:58:21PM +0100, Klaus Wenninger wrote: > But isn't dlm directly interfering with corosync so > that it would get the quorum state from there? > As you have 2-node set probably on a 2-node-cluster > this would - after both nodes down - wait for all > nodes up first. Isn't

Re: [ClusterLabs] trouble with IPaddr2

2017-10-12 Thread Valentin Vidic

On Wed, Oct 11, 2017 at 02:36:24PM +0200, Valentin Vidic wrote: > AFAICT, it found a better interface with that subnet and tried > to use it instead of the one specified in the parameters :) > > But maybe IPaddr2 should just skip interface auto-detection > if an explicit interfa

Re: [ClusterLabs] trouble with IPaddr2

2017-10-11 Thread Valentin Vidic

On Wed, Oct 11, 2017 at 01:29:40PM +0200, Stefan Krueger wrote: > ohh damn.. thanks a lot for this hint.. I delete all the IPs on enp4s0f0, and > than it works.. > but could you please explain why it now works? why he has a problem with this > IPs? AFAICT, it found a better interface with that s

Re: [ClusterLabs] trouble with IPaddr2

2017-10-11 Thread Valentin Vidic

On Wed, Oct 11, 2017 at 10:51:04AM +0200, Stefan Krueger wrote: > primitive HA_IP-Serv1 IPaddr2 \ > params ip=172.16.101.70 cidr_netmask=16 \ > op monitor interval=20 timeout=30 on-fail=restart nic=bond0 \ > meta target-role=Started There might be something wrong with the n

Re: [ClusterLabs] corosync service not automatically started

2017-10-10 Thread Valentin Vidic

On Tue, Oct 10, 2017 at 11:26:24AM +0200, Václav Mach wrote: > # The primary network interface > allow-hotplug eth0 > iface eth0 inet dhcp > # This is an autoconfigured IPv6 interface > iface eth0 inet6 auto allow-hotplug or dhcp could be causing problems. You can try disabling corosync and pacem

Re: [ClusterLabs] corosync service not automatically started

2017-10-10 Thread Valentin Vidic

On Tue, Oct 10, 2017 at 10:35:17AM +0200, Václav Mach wrote: > Oct 10 10:27:05 r1nren.et.cesnet.cz corosync[709]: [QB] Denied > connection, is not ready (709-1337-18) > Oct 10 10:27:06 r1nren.et.cesnet.cz corosync[709]: [QB] Denied > connection, is not ready (709-1337-18) > Oct 10 10:27

Re: [ClusterLabs] PostgreSQL Automatic Failover (PAF) v2.2.0

2017-10-05 Thread Valentin Vidic

On Thu, Oct 05, 2017 at 08:55:59PM +0200, Jehan-Guillaume de Rorthais wrote: > It doesn't seems impossible, however I'm not sure of the complexity around > this. > > You would have to either hack PAF and detect failover/migration or create a > new > RA that would always be part of the transition

Re: [ClusterLabs] PostgreSQL Automatic Failover (PAF) v2.2.0

2017-10-05 Thread Valentin Vidic

On Tue, Sep 12, 2017 at 04:48:19PM +0200, Jehan-Guillaume de Rorthais wrote: > PostgreSQL Automatic Failover (PAF) v2.2.0 has been released on September > 12th 2017 under the PostgreSQL licence. > > See: https://github.com/dalibo/PAF/releases/tag/v2.2.0 > > PAF is a PostgreSQL resource agent for

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-11 Thread Valentin Vidic

On Mon, Sep 11, 2017 at 04:18:08PM +0200, Klaus Wenninger wrote: > Just for my understanding: You are using watchdog-handling in corosync? Corosync package in Debian gets build with --enable-watchdog so by default it takes /dev/watchdog during runtime. Don't think SUSE or RedHat packages get buil

Re: [ClusterLabs] Strange Corosync (TOTEM) logs, Pacemaker OK but DLM stuck

2017-09-09 Thread Valentin Vidic

On Sun, Sep 10, 2017 at 08:27:47AM +0200, Ferenc Wágner wrote: > Confirmed: setting watchdog_device: off cluster wide got rid of the > above warnings. Interesting, what brand or version of IPMI has this problem? -- Valentin ___ Users mailing list: Use

Re: [ClusterLabs] XenServer guest and host watchdog

2017-09-09 Thread Valentin Vidic

On Fri, Sep 08, 2017 at 09:39:26PM +0100, Andrew Cooper wrote: > Yes. The internal mechanism of the host watchdog is to use one > performance counter to count retired instructions and generate an NMI > roughly once every half second (give or take C and P states). > > Separately, there is a one se

Re: [ClusterLabs] XenServer guest and host watchdog

2017-09-08 Thread Valentin Vidic

On Fri, Sep 08, 2017 at 12:57:12PM +, Mark Syms wrote: > As we discussed regarding the handling of watchdog in XenServer, both > guest and host, I've had a discussion with our subject matter expert > (Andrew, cc'd) on this topic. The guest watchdogs are handled by a > hardware timer in the hype

Re: [ClusterLabs] CephFS virtual IP

2017-08-27 Thread Valentin Vidic

On Mon, Aug 28, 2017 at 04:10:50AM +0200, Oscar Segarra wrote: > In Ceph, by design there is no single point of failure I terms of server > roles, nevertheless, from the client point of view, it might exist. > > In my environment: > Mon1: 192.168.100.101:6789 > Mon2: 192.168.100.102:6789 > Mon3:

Re: [ClusterLabs] epic fail

2017-07-24 Thread Valentin Vidic

On Mon, Jul 24, 2017 at 10:38:40AM -0500, Ken Gaillot wrote: > Standby is not necessary, it's just a cautious step that allows the > admin to verify that all resources moved off correctly. The restart that > yum does should be sufficient for pacemaker to move everything. > > A restart shouldn't le

Re: [ClusterLabs] epic fail

2017-07-24 Thread Valentin Vidic

On Mon, Jul 24, 2017 at 11:01:26AM -0500, Dimitri Maziuk wrote: > Lsof/fuser show the PID of the process holding FS open as "kernel". That could be the NFS server running in the kernel. -- Valentin ___ Users mailing list: Users@clusterlabs.org http://

Re: [ClusterLabs] epic fail

2017-07-23 Thread Valentin Vidic

On Sun, Jul 23, 2017 at 07:27:03AM -0500, Dmitri Maziuk wrote: > So yesterday I ran yum update that puled in the new pacemaker and tried to > restart it. The node went into its usual "can't unmount drbd because kernel > is using it" and got stonith'ed in the middle of yum transaction. The end > res

Re: [ClusterLabs] Coming in Pacemaker 1.1.17: container bundles

2017-07-01 Thread Valentin Vidic

On Fri, Jun 30, 2017 at 12:46:29PM -0500, Ken Gaillot wrote: > The challenge is that some properties are docker-specific and other > container engines will have their own specific properties. > > We decided to go with a tag for each supported engine -- so if we add > support for rkt, we'll add a

Re: [ClusterLabs] Coming in Pacemaker 1.1.17: container bundles

2017-06-30 Thread Valentin Vidic

On Fri, Mar 31, 2017 at 05:43:02PM -0500, Ken Gaillot wrote: > Here's an example of the CIB XML syntax (higher-level tools will likely > provide a more convenient interface): > > > > Would it be possible to make this a bit more generic like: so we have support for other container engin

Re: [ClusterLabs] simple active/active router using pacemaker+corosync

2017-01-26 Thread Valentin Vidic

On Thu, Jan 26, 2017 at 09:31:23PM +0100, Valentin Vidic wrote: > Guess you could create a Dummy resource and make INIFINITY colloction > constraints for the IPs so they follow Dummy as it moves between the > nodes :) In fact using resource sets this becomes one rule: colocation ip

Re: [ClusterLabs] simple active/active router using pacemaker+corosync

2017-01-26 Thread Valentin Vidic

On Thu, Jan 26, 2017 at 12:10:24PM +0100, Arturo Borrero Gonzalez wrote: > I have a rather simple 2 nodes active/active router using pacemaker+corosync. > > Why active-active? Well, one node holds the virtual IPv4 resources and > the other node holds the virtual IPv6 resources. > On failover, both

Re: [ClusterLabs] Debian

2016-07-25 Thread Valentin Vidic

On Mon, Jul 25, 2016 at 07:58:51PM +0200, Thierry Boibary wrote: > is "Pacemaker" available on Debian 8.1? Only via jessie-backports, as you can see here: https://packages.debian.org/search?keywords=pacemaker -- Valentin ___ Users mailing list: Use

[ClusterLabs] pcs testsuite fail: test_run_all_workers

2016-07-25 Thread Valentin Vidic

FAIL: test_run_all_workers (pcs.test.test_utils.RunParallelTest) -- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/pcs/test/test_utils.py", line 1800, in test_run_all_workers self.assertEqual(log, [

Re: [ClusterLabs] eventmachine gem in pcsd

2016-06-30 Thread Valentin Vidic

On Thu, Jun 30, 2016 at 01:27:25PM +0200, Tomas Jelinek wrote: > It seems eventmachine can be safely dropped as all tests passed without it. Great, thanks for confirming. -- Valentin ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org

Re: [ClusterLabs] pcs testsuite status

2016-06-29 Thread Valentin Vidic

On Wed, Jun 29, 2016 at 10:31:42AM +0200, Tomas Jelinek wrote: > This should be replaceable by any agent which does not provide unfencing, > i.e. it does not have on_target="1" automatic="1" attributes in name="on" /> . You may need to experiment with few agents to find one which > works. Just ch

Re: [ClusterLabs] pcs testsuite status

2016-06-28 Thread Valentin Vidic

On Tue, Jun 28, 2016 at 02:35:53PM +0200, Tomas Jelinek wrote: > You are right. The right pacemaker (and corosync, resource agents...) > version is needed for tests to pass. It's not an easy task to figure out > what the right version is, though. For pcs 0.9.152 it's > pacemaker-1.1.15-2.el7. > >

[ClusterLabs] pcs testsuite status

2016-06-28 Thread Valentin Vidic

I'm trying to run pcs tests on Debian unstable, but there are some strange failures like diffs failing due to an additional space at the end of the line or just with "Error: cannot load cluster status, xml does not conform to the schema" Any idea what could be the issue here? I assume the tests w

[ClusterLabs] eventmachine gem in pcsd

2016-06-27 Thread Valentin Vidic

Hi, Is it safe to drop eventmachine as a dependency? I see it's only mentioned in the makefiles and not used by any of the ruby code: pcsd/Makefile: vendor/cache/eventmachine-1.2.0.1.gem \ pcsd/Gemfile:gem 'eventmachine' pcsd/Gemfile.lock:eventmachine (1.2.0.1) pcsd/Gemfile.lock: eventmachi

Re: [ClusterLabs] dlm_controld 4.0.4 exits when crmd is fencing another node

2016-04-26 Thread Valentin Vidic

On Fri, Jan 22, 2016 at 07:57:52PM +0300, Vladislav Bogdanov wrote: > Tried reverting this one and a51b2bb ("If an error occurs unlink the > lock file and exit with status 1") one-by-one and both together, the > same result. > > So problem seems to be somewhere deeper. I've got the same fencing

77 matches

Mail list logo