We have not found any documentation or any solutions for this issue. We’ve recently upgraded to 3.7 and have had the same issues. Our work around has been to have cloudwatch check logs for anytime a node (or master) is registered and then it add the logging fluentd label . and if it’s a master, it sets schedulingdisabled, because we also see this behavior on our masters.
I’d like to see a resolution for these and some documentation? We’ve struggled to get it working properly. Setting the command in node services command caused other issues, but it may have been misconfigured. Thanks, Todd Today's Topics: 3. Re: Looking for documentation on cloud provider delete node andregister node (Mark McKinstry) ---------------------------------------------------------------------- ------------------------------ Message: 3 Date: Thu, 15 Mar 2018 14:53:08 -0700 From: Mark McKinstry <mmcki...@redhat.com> To: Clayton Coleman <ccole...@redhat.com> Cc: "users@lists.openshift.redhat.com" <users@lists.openshift.redhat.com>,"Ernst, Chad" <chad_er...@unigroup.com> Subject: Re: Looking for documentation on cloud provider delete node andregister node Message-ID: <CAGkJPmh6H56CzN-jSgcHNJgjjCvzMiM3MUqg-TwMmYH6sg=a...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Is there more info on this? I'm having this problem one OCP 3.7 right now too. If a node is rebooted, it comes back up but is missing the logging-infra-fluentd=true label. On Thu, Dec 21, 2017 at 10:15 AM, Clayton Coleman <ccole...@redhat.com> wrote: > There was an open bug on this previously - I?m having trouble finding it > at the moment. The node may be racing with the cloud controller and then > not updating the labels. One workaround is to simply add an ?oc label > node/$(hostname) ...? command to the origin-node services as a prestart > command. > > On Dec 21, 2017, at 9:13 AM, Ernst, Chad <chad_er...@unigroup.com> wrote: > > > > Running Origin 3.6 on AWS, we?ve found that if our EC2 instances go down > for any length of time and come back up (as opposed to the EC2 instance > getting terminated) the nodes are automatically deleted from openshift then > re-registered after the ec2 instance is restarted. The activity is logged > in /var/log/messages > > > > Dec 20 21:59:30 ip-172-21-21-30 origin-master-controllers: I1220 > 21:59:30.297638 26242 nodecontroller.go:761] Deleting node (no longer > present in cloud provider): ip-172-21-20-30.ec2.internal > > Dec 20 21:59:30 ip-172-21-21-30 origin-master-controllers: I1220 > 21:59:30.297662 26242 controller_utils.go:273] Recording Deleting Node > ip-172-21-20-30.ec2.internal because it's not present according to cloud > provider event message for node ip-172-21-20-30.ec2.internal > > Dec 20 21:59:30 ip-172-21-21-30 origin-master-controllers: I1220 > 21:59:30.297895 26242 event.go:217] Event(v1.ObjectReference{Kind:"Node", > Namespace:"", Name:"ip-172-21-20-30.ec2.internal", > UID:"36c8dca4-e5c9-11e7-b2ce-0e69b80c212e", APIVersion:"", > ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'DeletingNode' > Node ip-172-21-20-30.ec2.internal event: Deleting Node > ip-172-21-20-30.ec2.internal because it's not present according to cloud > provider > > > > > > Dec 20 23:10:06 ip-172-21-21-30 origin-master-controllers: I1220 > 23:10:06.303567 26242 nodecontroller.go:616] NodeController observed a > new Node: "ip-172-21-22-30.ec2.internal" > > Dec 20 23:10:06 ip-172-21-21-30 origin-master-controllers: I1220 > 23:10:06.303597 26242 controller_utils.go:273] Recording Registered Node > ip-172-21-22-30.ec2.internal in NodeController event message for node > ip-172-21-22-30.ec2.internal > > Dec 20 23:10:06 ip-172-21-21-30 origin-master-controllers: I1220 > 23:10:06.303899 26242 event.go:217] Event(v1.ObjectReference{Kind:"Node", > Namespace:"", Name:"ip-172-21-22-30.ec2.internal", > UID:"e850129f-e5da-11e7-ac5e-027542a418ee", APIVersion:"", > ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' > Node ip-172-21-22-30.ec2.internal event: Registered Node > ip-172-21-22-30.ec2.internal in NodeController > > > > The issue we are running into is that when the nodes come back they don?t > have all of our labels on them. They don?t get labelled to run the fluentd > pods ?logging-infra-fluentd=true? and my masters aren?t set for ?Scheduling > Disabled?. > > > > Can anybody point me to the any doc regarding the automatic registration > of the node from the cloudprovider or knows how to adjust the behavior when > a node is re-registered so they can be tagged properly. > > > > Thanks > > > > Chad > > ######################################################################## > The information contained in this message, and any attachments thereto, > is intended solely for the use of the addressee(s) and may contain > confidential and/or privileged material. Any review, retransmission, > dissemination, copying, or other use of the transmitted information is > prohibited. If you received this in error, please contact the sender > and delete the material from any computer. UNIGROUP.COM > ######################################################################## > > _______________________________________________ > users mailing list > users@lists.openshift.redhat.com > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshiftmm%2Flistinfo%2Fusers&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=ULsix8KQ1apHMaqXHXKFUTgNRjEzJJen4U87pAdkjCo%3D&reserved=0 > > > _______________________________________________ > users mailing list > users@lists.openshift.redhat.com > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshiftmm%2Flistinfo%2Fusers&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=ULsix8KQ1apHMaqXHXKFUTgNRjEzJJen4U87pAdkjCo%3D&reserved=0 > > -- Mark McKinstry Senior Consultant , RHCA Red Hat Consulting (West) mmcki...@redhat.com M: 510-646-1280 -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshift-archives%2Fusers%2Fattachments%2F20180315%2F63e5063a%2Fattachment.html&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=x0PDUKnWbtqLOhAmnIelZiEgasglEWQkqVyTYpXJ7PM%3D&reserved=0> ------------------------------ Message: 4 Date: Thu, 15 Mar 2018 23:28:28 +0100 From: Bahhoo <bah...@gmail.com> To: <rahul334...@gmail.com> Cc: users <users@lists.openshift.redhat.com> Subject: YNT: Pods stuck on Terminating status Message-ID: <5aaaf3c2.845d1c0a.57d7c.9...@mx.google.com> Content-Type: text/plain; charset="utf-8" Hi Rahul, That won't do it either. Thanks Bahho -----Orijinal ileti----- Kimden: "Rahul Agarwal" <rahul334...@gmail.com> G?nderme tarihi: ?15.?3.?2018 22:26 Kime: "bahhooo" <bah...@gmail.com> Bilgi: "users" <users@lists.openshift.redhat.com> Konu: Re: Pods stuck on Terminating status Hi Bahho Try: oc delete all -l app=<app_name> Thanks, Rahul On Thu, Mar 15, 2018 at 5:19 PM, bahhooo <bah...@gmail.com> wrote: Hi all, I have some zombie pods stuck on Terminating status on a OCP 3.7 HA-cluster. oc delete with --grace-period=0 --force etc. won't work. Docker restart. server reboot won't help either. I tried to find the pod key in etcd either in order to delete it manually. I couldn't find it. Is there a way to delete these pods? Bahho _______________________________________________ users mailing list users@lists.openshift.redhat.com https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshiftmm%2Flistinfo%2Fusers&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=ULsix8KQ1apHMaqXHXKFUTgNRjEzJJen4U87pAdkjCo%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshift-archives%2Fusers%2Fattachments%2F20180315%2Ff158251e%2Fattachment.html&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=SQ%2FJr7qieAr7bpkqcDPmiw0j1Ihd1Ve6S5ucG%2FanvDk%3D&reserved=0> ------------------------------ _______________________________________________ users mailing list users@lists.openshift.redhat.com https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openshift.redhat.com%2Fopenshiftmm%2Flistinfo%2Fusers&data=01%7C01%7Ctodd_walters%40unigroup.com%7C4c4cf18d8d0345abbec308d58ac43bad%7C259bdc2f86d3477b8cb34eee64289142%7C1&sdata=ULsix8KQ1apHMaqXHXKFUTgNRjEzJJen4U87pAdkjCo%3D&reserved=0 End of users Digest, Vol 68, Issue 29 ************************************* ######################################################################## The information contained in this message, and any attachments thereto, is intended solely for the use of the addressee(s) and may contain confidential and/or privileged material. Any review, retransmission, dissemination, copying, or other use of the transmitted information is prohibited. If you received this in error, please contact the sender and delete the material from any computer. UNIGROUP.COM ######################################################################## _______________________________________________ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users