Re: [ClusterLabs] Disable all resources in a group if one or more of them fail and are unable to reactivate

damiano giuliani Thu, 28 Jan 2021 01:36:18 -0800

Hi Ken, thanks for the answer and explanation.
So i stop strugglin myself finding a solution!


your clarifications are very uselfull and appreciated.

Many Thanks

Have a good day.

Damiano

Il giorno mer 27 gen 2021 alle ore 20:03 Ken Gaillot <kgail...@redhat.com>
ha scritto:

> On Wed, 2021-01-27 at 19:25 +0100, damiano giuliani wrote:
> > Hi Andrei, Thanks for ur help.
> > if one of my resource in the group  fails or the primary node went
> > down ( in my case acspcmk-02 ), the probe notices it and pacemaker
> > tries to restart the whole resource group on the second node.
> > if the second node cant run one of my grouped resources, it tries to
> > stop them.
> >
> >
> > i attached my cluster status; my primary node ( acspcmk-02 ) fails
> > and the resource group tries to restart on the acspcmk-01, i keep
> > broken the resource  "lta-subscription-backend-ope-s3" on purpose and
> > as you can see some grouped resources are still started..
> > i would like to know how achive a  condition that the resource group
> > must start properly for each resources, if not stop all the group
> > without some services still up and running.
>
> With a group, later members depend on earlier members. If an earlier
> member can't run, then no members after it can run.
>
> However we can't make the dependency go in both directions. If an
> earlier member can't run unless a later member is active, and vice
> versa, then how can anything be started?
>
> By default, Pacemaker tries to recover failed resources on the same
> node, up to its migration-threshold (which defaults to a million
> times). Once a group member reaches its migration-threshold, Pacemaker
> will move the entire group to another node if one is available. However
> if no node is available for the failed member, then it will just remain
> stopped (along with any later members in the group), and the earlier
> members will stay active where they are.
>
> I don't think there's any way to prevent earlier members from running
> if a later member has no available node.
>
> > 2 nodes configured
> > 28 resources configured
> >
> > Online: [ acspcmk-01 ]
> > OFFLINE: [ acspcmk-02 ]
> >
> > Full list of resources:
> >
> >  Clone Set: lta-odata-frontend-ope-s1-clone [lta-odata-frontend-ope-
> > s1]
> >      Started: [ acspcmk-01 ]
> >      Stopped: [ acspcmk-02 ]
> >  Clone Set: lta-odata-frontend-ope-s2-clone [lta-odata-frontend-ope-
> > s2]
> >      Started: [ acspcmk-01 ]
> >      Stopped: [ acspcmk-02 ]
> >  Clone Set: lta-odata-frontend-ope-s3-clone [lta-odata-frontend-ope-
> > s3]
> >      Started: [ acspcmk-01 ]
> >      Stopped: [ acspcmk-02 ]
> >  Clone Set: s1ltaestimationtime-clone [s1ltaestimationtime]
> >      Started: [ acspcmk-01 ]
> >      Stopped: [ acspcmk-02 ]
> >  Clone Set: s2ltaestimationtime-clone [s2ltaestimationtime]
> >      Started: [ acspcmk-01 ]
> >      Stopped: [ acspcmk-02 ]
> >  Clone Set: s3ltaestimationtime-clone [s3ltaestimationtime]
> >      Started: [ acspcmk-01 ]
> >      Stopped: [ acspcmk-02 ]
> >  Clone Set: openresty-clone [openresty]
> >      Started: [ acspcmk-01 ]
> >      Stopped: [ acspcmk-02 ]
> >  Resource Group: LTA_SINGLE_RESOURCES
> >      VIP        (ocf::heartbeat:IPaddr2):       Started acspcmk-01
> >      lta-subscription-backend-ope-s1    (systemd:lta-subscription-
> > backend-ope-s1):      Started acspcmk-01
> >      lta-subscription-backend-ope-s2    (systemd:lta-subscription-
> > backend-ope-s2):      Started acspcmk-01
> >      lta-subscription-backend-ope-s3    (systemd:lta-subscription-
> > backend-ope-s3):      Stopped
> >      s1ltaquotaservice  (systemd:s1ltaquotaservice):    Stopped
> >      s2ltaquotaservice  (systemd:s2ltaquotaservice):    Stopped
> >      s3ltaquotaservice  (systemd:s3ltaquotaservice):    Stopped
> >      s1ltarolling       (systemd:s1ltarolling): Stopped
> >      s2ltarolling       (systemd:s2ltarolling): Stopped
> >      s3ltarolling       (systemd:s3ltarolling): Stopped
> >      s1srvnotificationdispatcher
> >  (systemd:s1srvnotificationdispatcher):  Stopped
> >      s2srvnotificationdispatcher
> >  (systemd:s2srvnotificationdispatcher):  Stopped
> >      s3srvnotificationdispatcher
> >  (systemd:s3srvnotificationdispatcher):  Stopped
> >
> > Failed Resource Actions:
> > * lta-subscription-backend-ope-s3_start_0 on acspcmk-01 'unknown
> > error' (1): call=466, status=complete, exitreason='',
> >     last-rc-change='Wed Jan 27 13:00:21 2021', queued=0ms,
> > exec=2128ms
> >
> > Daemon Status:
> >   corosync: active/disabled
> >   pacemaker: active/disabled
> >   pcsd: active/enabled
> >   sbd: active/enabled
> >
> >
> >   I hope i explained my problem at my best,
> >
> > Thanks for your time and help.
> >
> > Good Evening
> >
> > Damiano
> >
> > Il giorno mer 27 gen 2021 alle ore 19:03 Andrei Borzenkov <
> > arvidj...@gmail.com> ha scritto:
> > > 27.01.2021 19:06, damiano giuliani пишет:
> > > > Hi all im pretty new to the clusters, im struggling trying to
> > > configure a
> > > > bounch of resources and test how they failover.my need is to
> > > start and
> > > > manage a group of resources as one (in order to archive this a
> > > resource
> > > > group has been created), and if one of them cant run and still
> > > fails, the
> > > > cluster will try to restart the resource group in the secondary
> > > node, if it
> > > > cant run the all the resource toghter disable all the resource
> > > group.
> > > > i would like to know if there is a way to set the cluster to
> > > disable all
> > > > the resources of the group (or the group itself) if it cant be
> > > run all the
> > > > resoruces somewhere.
> > > >
> > >
> > > That's what pacemaker group does. I am not sure what you mean with
> > > "disable all resources". If resource fail count on a node exceeds
> > > threshold, this node is banned from running resource. If resource
> > > failed
> > > on every node, no node can run it until you clear fail count.
> > >
> > > "Disable resource" in pacemaker would mean setting its target-role
> > > to
> > > stopped. That does not happen automatically (at least I am not
> > > aware of it).
> > > _______________________________________________
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > >
> > > ClusterLabs home: https://www.clusterlabs.org/
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> --
> Ken Gaillot <kgail...@redhat.com>
>
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Disable all resources in a group if one or more of them fail and are unable to reactivate

Reply via email to