Cloud provider register/deregister node and labels

Walters, Todd Fri, 16 Mar 2018 12:35:44 -0700

We have not found any documentation or any solutions for this issue. We’ve 
recently upgraded to 3.7 and have had the same issues. Our work around has been 
to have cloudwatch check logs for anytime a node (or master) is registered and 
then it add the logging fluentd label . and if it’s a master, it sets 
schedulingdisabled, because we also see this behavior on our masters.


I’d like to see a resolution for these and some documentation? We’ve struggled 
to get it working properly. Setting the command in node services command caused 
other issues, but it may have been misconfigured.

Thanks,

Todd

Today's Topics:
3. Re: Looking for documentation on cloud provider delete node
          andregister node (Mark McKinstry)

    ----------------------------------------------------------------------


------------------------------

    Message: 3
    Date: Thu, 15 Mar 2018 14:53:08 -0700
    From: Mark McKinstry <mmcki...@redhat.com>
    To: Clayton Coleman <ccole...@redhat.com>
    Cc: "users@lists.openshift.redhat.com"
    <users@lists.openshift.redhat.com>,"Ernst, Chad"
    <chad_er...@unigroup.com>
    Subject: Re: Looking for documentation on cloud provider delete node
    andregister node
    Message-ID:
    <CAGkJPmh6H56CzN-jSgcHNJgjjCvzMiM3MUqg-TwMmYH6sg=a...@mail.gmail.com>
    Content-Type: text/plain; charset="utf-8"

    Is there more info on this? I'm having this problem one OCP 3.7 right now
    too. If a node is rebooted, it comes back up but is missing
    the logging-infra-fluentd=true label.




    On Thu, Dec 21, 2017 at 10:15 AM, Clayton Coleman <ccole...@redhat.com>
    wrote:

    > There was an open bug on this previously - I?m having trouble finding it
    > at the moment.  The node may be racing with the cloud controller and then
    > not updating the labels.  One workaround is to simply add an ?oc label
    > node/$(hostname) ...? command to the origin-node services as a prestart
    > command.
    >
    > On Dec 21, 2017, at 9:13 AM, Ernst, Chad <chad_er...@unigroup.com> wrote:
    >
    >
    >
    > Running Origin 3.6 on AWS, we?ve found that if our EC2 instances go down
    > for any length of time and come back up (as opposed to the EC2 instance
    > getting terminated) the nodes are automatically deleted from openshift 
then
    > re-registered after the ec2 instance is restarted.  The activity is logged
    > in /var/log/messages
    >
    >
    >
    > Dec 20 21:59:30 ip-172-21-21-30 origin-master-controllers: I1220
    > 21:59:30.297638   26242 nodecontroller.go:761] Deleting node (no longer
    > present in cloud provider): ip-172-21-20-30.ec2.internal
    >
    > Dec 20 21:59:30 ip-172-21-21-30 origin-master-controllers: I1220
    > 21:59:30.297662   26242 controller_utils.go:273] Recording Deleting Node
    > ip-172-21-20-30.ec2.internal because it's not present according to cloud
    > provider event message for node ip-172-21-20-30.ec2.internal
    >
    > Dec 20 21:59:30 ip-172-21-21-30 origin-master-controllers: I1220
    > 21:59:30.297895   26242 event.go:217] 
Event(v1.ObjectReference{Kind:"Node",
    > Namespace:"", Name:"ip-172-21-20-30.ec2.internal",
    > UID:"36c8dca4-e5c9-11e7-b2ce-0e69b80c212e", APIVersion:"",
    > ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'DeletingNode'
    > Node ip-172-21-20-30.ec2.internal event: Deleting Node
    > ip-172-21-20-30.ec2.internal because it's not present according to cloud
    > provider
    >
    >
    >
    >
    >
    > Dec 20 23:10:06 ip-172-21-21-30 origin-master-controllers: I1220
    > 23:10:06.303567   26242 nodecontroller.go:616] NodeController observed a
    > new Node: "ip-172-21-22-30.ec2.internal"
    >
    > Dec 20 23:10:06 ip-172-21-21-30 origin-master-controllers: I1220
    > 23:10:06.303597   26242 controller_utils.go:273] Recording Registered Node
    > ip-172-21-22-30.ec2.internal in NodeController event message for node
    > ip-172-21-22-30.ec2.internal
    >
    > Dec 20 23:10:06 ip-172-21-21-30 origin-master-controllers: I1220
    > 23:10:06.303899   26242 event.go:217] 
Event(v1.ObjectReference{Kind:"Node",
    > Namespace:"", Name:"ip-172-21-22-30.ec2.internal",
    > UID:"e850129f-e5da-11e7-ac5e-027542a418ee", APIVersion:"",
    > ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 
'RegisteredNode'
    > Node ip-172-21-22-30.ec2.internal event: Registered Node
    > ip-172-21-22-30.ec2.internal in NodeController
    >
    >
    >
    > The issue we are running into is that when the nodes come back they don?t
    > have all of our labels on them.  They don?t get labelled to run the 
fluentd
    > pods ?logging-infra-fluentd=true? and my masters aren?t set for 
?Scheduling
    > Disabled?.
    >
    >
    >
    > Can anybody point me to the any doc regarding the automatic registration
    > of the node from the cloudprovider or knows how to adjust the behavior 
when
    > a node is re-registered so they can be tagged properly.
    >
    >
    >
    > Thanks
    >
    >
    >
    > Chad
    >
    > ########################################################################
    > The information contained in this message, and any attachments thereto,
    > is intended solely for the use of the addressee(s) and may contain
    > confidential and/or privileged material. Any review, retransmission,
    > dissemination, copying, or other use of the transmitted information is
    > prohibited. If you received this in error, please contact the sender
    > and delete the material from any computer. UNIGROUP.COM
    > ########################################################################
    >
    > _______________________________________________
    > users mailing list
    > users@lists.openshift.redhat.com
    > 
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshiftmm%2Flistinfo%2Fusers&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=ULsix8KQ1apHMaqXHXKFUTgNRjEzJJen4U87pAdkjCo%3D&reserved=0
    >
    >
    > _______________________________________________
    > users mailing list
    > users@lists.openshift.redhat.com
    > 
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshiftmm%2Flistinfo%2Fusers&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=ULsix8KQ1apHMaqXHXKFUTgNRjEzJJen4U87pAdkjCo%3D&reserved=0
    >
    >


    --
    Mark McKinstry
    Senior Consultant , RHCA
    Red Hat Consulting (West)
    mmcki...@redhat.com     M: 510-646-1280
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: 
<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshift-archives%2Fusers%2Fattachments%2F20180315%2F63e5063a%2Fattachment.html&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=x0PDUKnWbtqLOhAmnIelZiEgasglEWQkqVyTYpXJ7PM%3D&reserved=0>

    ------------------------------

    Message: 4
    Date: Thu, 15 Mar 2018 23:28:28 +0100
    From: Bahhoo <bah...@gmail.com>
    To: <rahul334...@gmail.com>
    Cc: users <users@lists.openshift.redhat.com>
    Subject: YNT: Pods stuck on Terminating status
    Message-ID: <5aaaf3c2.845d1c0a.57d7c.9...@mx.google.com>
    Content-Type: text/plain; charset="utf-8"

    Hi Rahul,

    That won't do it either.

    Thanks
    Bahho

    -----Orijinal ileti-----
    Kimden: "Rahul Agarwal" <rahul334...@gmail.com>
    G?nderme tarihi: ?15.?3.?2018 22:26
    Kime: "bahhooo" <bah...@gmail.com>
    Bilgi: "users" <users@lists.openshift.redhat.com>
    Konu: Re: Pods stuck on Terminating status

    Hi Bahho


    Try: oc delete all -l app=<app_name>


    Thanks,
    Rahul


    On Thu, Mar 15, 2018 at 5:19 PM, bahhooo <bah...@gmail.com> wrote:

    Hi all,


    I have some zombie pods stuck on Terminating status on a OCP 3.7 HA-cluster.

    oc delete with --grace-period=0 --force etc. won't work.
    Docker restart. server reboot won't help either.


    I tried to find the pod key in etcd either in order to delete it manually. 
I couldn't find it.


    Is there a way to delete these pods?








    Bahho

    _______________________________________________
    users mailing list
    users@lists.openshift.redhat.com
    
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshiftmm%2Flistinfo%2Fusers&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=ULsix8KQ1apHMaqXHXKFUTgNRjEzJJen4U87pAdkjCo%3D&reserved=0
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: 
<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshift-archives%2Fusers%2Fattachments%2F20180315%2Ff158251e%2Fattachment.html&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=SQ%2FJr7qieAr7bpkqcDPmiw0j1Ihd1Ve6S5ucG%2FanvDk%3D&reserved=0>

    ------------------------------

    _______________________________________________
    users mailing list
    users@lists.openshift.redhat.com
    
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshiftmm%2Flistinfo%2Fusers&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=ULsix8KQ1apHMaqXHXKFUTgNRjEzJJen4U87pAdkjCo%3D&reserved=0


    End of users Digest, Vol 68, Issue 29
    *************************************



########################################################################
The information contained in this message, and any attachments thereto,
is intended solely for the use of the addressee(s) and may contain
confidential and/or privileged material. Any review, retransmission,
dissemination, copying, or other use of the transmitted information is
prohibited. If you received this in error, please contact the sender
and delete the material from any computer. UNIGROUP.COM
########################################################################


_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Cloud provider register/deregister node and labels

Reply via email to