>>> Gabriele Bulfon <gbul...@sonicle.com> schrieb am 11.12.2020 um 15:51 in Nachricht <1053095478.6540.1607698288628@www>: > I cannot "use wait_for_all: 0", cause this would move automatically a powered > off node from UNCLEAN to OFFLINE and mount the ZFS pool (total risk!): I want > to manually move from UNCLEAN to OFFLINE, when I know that 2nd node is > actually off!
Personally I think when you'll have to confirm that a node is down you need no cluster, because all actions would wait until the node is no longer unclean. I wouldn't want to be alerted in the middle of the night at weekends just to confirm that there was some problem, when the cluster could handle that automatically while I sleep. > > Actually with wait_for_all to default (1) that was the case, so node1 would > wait for my intervention when booting and node2 is down. > So what think I need is some way to manually override the quorum in such a > case (node 2 down for maintenance, node 1 reboot), so I would manually turn > OFFLINE node2 from UNCLEAN, manually override quorum and have zpool mount and > NFS ip up. > > Any idea? > > > Sonicle S.r.l. : http://www.sonicle.com > Music: http://www.gabrielebulfon.com > eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets > > > > > > ---------------------------------------------------------------------------- > ------ > > Da: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> > A: users@clusterlabs.org > Data: 11 dicembre 2020 11.35.44 CET > Oggetto: [ClusterLabs] Antw: [EXT] Recoveing from node failure > > > Hi! > > Did you take care for special "two node" settings (quorum I mean)? > When I use "crm_mon -1Arfj", I see something like > " * Current DC: h19 (version > 2.0.4+20200616.2deceaa3a-3.3.1-2.0.4+20200616.2deceaa3a) - partition with > quorum" > > What do you see? > > Regards, > Ulrich > >>>> Gabriele Bulfon <gbul...@sonicle.com> schrieb am 11.12.2020 um 11:23 in > Nachricht <350849824.6300.1607682209284@www>: >> Hi, I finally could manage stonith with IPMI in my 2 nodes XStreamOS/illumos > >> storage cluster. >> I have NFS IPs and shared storage zpool moving from one node or the other, >> and stonith controllin ipmi powering off when something is not clear. >> >> What happens now is that if I shutdown 2nd node, I see the OFFLINE status >> from node 1 and everything is up and running, and this is ok: >> >> Online: [ xstha1 ] >> OFFLINE: [ xstha2 ] >> Full list of resources: >> xstha1_san0_IP (ocf::heartbeat:IPaddr): Started xstha1 >> xstha2_san0_IP (ocf::heartbeat:IPaddr): Started xstha1 >> xstha1-stonith (stonith:external/ipmi): Started xstha1 >> xstha2-stonith (stonith:external/ipmi): Started xstha1 >> zpool_data (ocf::heartbeat:ZFS): Started xstha1 >> But if also reboot 1st node, it starts with the UNCLEAN state, nothing is >> running, so I clearstate of node 2, but resources are not started: >> >> Online: [ xstha1 ] >> OFFLINE: [ xstha2 ] >> Full list of resources: >> xstha1_san0_IP (ocf::heartbeat:IPaddr): Stopped >> xstha2_san0_IP (ocf::heartbeat:IPaddr): Stopped >> xstha1-stonith (stonith:external/ipmi): Stopped >> xstha2-stonith (stonith:external/ipmi): Stopped >> zpool_data (ocf::heartbeat:ZFS): Stopped >> I tried restarting zpool_data or other resources: >> # crm resource start zpool_data >> but nothing happens! >> How can I recover from this state? Node2 needs to stay down, but I want >> node1 to work. >> Thanks! >> Gabriele >> >> >> Sonicle S.r.l. : http://www.sonicle.com >> Music: http://www.gabrielebulfon.com >> eXoplanets : https://gabrielebulfon.bandcamp.com/album/exoplanets >> > > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/