Re: Node not joining cluster during ansible install

Tim Dudgeon Wed, 23 Aug 2017 05:30:10 -0700

Any thoughts on how to fix this problem. In summary: the master and nodeseem to startup OK, but the node does not join the cluster. Moredetails here:https://github.com/OpenRiskNet/home/blob/master/openshift/simple-ansible.md

Tim



On 15/08/2017 15:16, Tim Bielawa wrote:

Discussion on-going in new github issue:https://github.com/openshift/openshift-ansible/issues/5088

On Tue, Aug 15, 2017 at 8:44 AM, Tim Bielawa <tbiel...@redhat.com<mailto:tbiel...@redhat.com>> wrote:


    Tim,

    Can you please provide more information? You full inventory would
    be very useful right now for debugging. Feel free to mask your
    hostnames if you wish. What I need to see to debug this further
    are all the parameters you're setting in the [OSEv3] section and
    applying to each host in [masters] and [nodes].

    You will find my GPG public key fingerprint in my signature If you
    wish to encrypt the inventory file instead

    As for those two stalls you mentioned:

        "Ensure OpenShift <THING> correctly rolls out (best-effort today)"


    The delays you experienced are normal and expected. Those delays
    are typically because the pod images were being downloaded to your
    hosts. However, you showed your 'oc get nodes' output and I
    noticed your master said "Ready,SchedulingDisabled". Because your
    master is labeled as 'SchedulingDisabled' then your master should
    *NOT* be running any pods. In which case that means it wasn't
    downloading pod images.

    Can you please provide the following information:

    * The output from `oc get all` on your master
    * The output `docker images` on your node *AND* your master
    * Your complete inventory file. As I said before, feel free to
    mask your hostnames or IPs if you prefer.

    You logs would also be helpful. Ensure you run ansible-playbook
    with the -vv option for extra verbosity. You can do this in two ways:

    1) If you run the install again you can set:

        log_path = /tmp/ansible.log


    in the [defaults] section of your ansible.cfg file.

    2) Alternatively you can capture the output of ansible using the
    `tee` command like so:

        ansible-playbook -vv -i <INVENTORY> ./playbooks/byo/config.yml
        | tee /tmp/ansible.log


    Again, if you wish to keep this information private, my GPG key is
    in my signature. Short ID is 0333AE37.


    Thanks!



    On Tue, Aug 15, 2017 at 6:15 AM, Tim Dudgeon
    <tdudgeon...@gmail.com <mailto:tdudgeon...@gmail.com>> wrote:

        Thanks for the response, and sorry for the delay on my end -
        I've been away for a week.

        I ran through the process again and got the same result. On
        the node it looks like the openshift services are running OK:

        systemctl list-units --all | grep -i origin
          origin-node.service loaded active running OpenShift Node

        But from the master the node has not joined the cluster:

        oc get nodes
        NAME STATUS AGE VERSION
        2c0e37ab-f41e-40f1-a466-a575c85823b6.priv.cloud.scaleway.com
        <http://2c0e37ab-f41e-40f1-a466-a575c85823b6.priv.cloud.scaleway.com>
        Ready,SchedulingDisabled 26m v1.6.1+5115d708d7

        The install process seems to have gone OK. There were no
        obvious errors, though it did twice stall at a point like this:

        ### TASK [openshift_hosted : Ensure OpenShift router
        correctly rolls out (best-effort today)] ******************

        But after waiting for about 5-10 mins it continued.

        There were lot of 'skipping' messages during the install, but
        no obvious errors. The output was huge and not captured to a
        file, so I'd have to run it again to try to get a full log.

        Any thoughts as to what is wrong?

        Tim


        On 04/08/2017 16:07, Tim Bielawa wrote:

        (reposting: forgot to reply-all the first time)


        Just based off of the number of tasks your summary says
        completed I am not sure your installation actually completed
        in full. I expect to see upwards of 1->2 thousand tasks.


        A while back we changed node integration behavior such that
        if a node fails to provision it does not stop your entire
        installation. This is to ease the pain felt when provisioning
        large (hundred+) node clusters.

            <private node1 dns name> : ok=235 changed=56
            unreachable=0 failed=0


        That node did not fully install. Open a shell on that node
        and check the openshift services. I'm willing to bet that

            systemctl list-units --all | grep -i origin


        would show the node service is not running. Find the name of
        the node service and then examine the journal logs for that node

            journalctl -x -u <node-service-name>



        I think we (the openshift-ansible team) will want to add
        detection of failed node integrations into our error summary
        report in the future. Would you mind please opening an issue
        for this on our github page with this information?


        Thanks!



        On Sun, Jul 30, 2017 at 10:57 AM, Tim Dudgeon
        <tdudgeon...@gmail.com <mailto:tdudgeon...@gmail.com>> wrote:

            I'm trying to get to grips with the advanced (Ansible)
            installer.
            Initially I'm trying to do something very simple, fire up
            a cluster with one master and one node.
            My inventory file looks like this:

            [OSEv3:children]
            masters
            nodes


            [OSEv3:vars]
            ansible_ssh_user=root
            openshift_hostname=<private master dns name>
            openshift_master_cluster_hostname=<private master dns name>
            openshift_master_cluster_public_hostname=<public master
            dns name>
            openshift_disable_check=docker_storage,memory_availability
            openshift_deployment_type=origin

            [masters]
            <private master dns name>

            [etcd]
            <private master dns name>


            [nodes]
            <private master dns name>
            <private node1 dns name>


            I run:
            ansible-playbook ~/openshift-ansible/playbooks/byo/config.yml
            and (after a long time) it completes, without any
            noticeable errors:

            ...
            PLAY RECAP
            
*********************************************************************************************************************************************************
            <private node1 dns name> : ok=235  changed=56
            unreachable=0    failed=0
            <private master dns name> : ok=623  changed=166
            unreachable=0    failed=0
            localhost                  : ok=12   changed=0
            unreachable=0 failed=0

            Both nodes seem to have been setup OK.
            But when I look on the master node there is only the
            master in the cluster, no second node:

            oc get nodes
            NAME STATUS  AGE
            <private master dns name> Ready,SchedulingDisabled   32m

            and of course like this nothing can get scheduled.

            Presumably the node should be added to the cluster, so
            any ideas what is going wrong here?

            Thanks
            Tim

            _______________________________________________
            users mailing list
            users@lists.openshift.redhat.com
            <mailto:users@lists.openshift.redhat.com>
            http://lists.openshift.redhat.com/openshiftmm/listinfo/users
            <http://lists.openshift.redhat.com/openshiftmm/listinfo/users>

--Tim Bielawa, Sr. Software Engineer [ED-C137]

        IRC: tbielawa (#openshift)
        1BA0 4FAB 4C13 FBA0 A036  4958 AD05 E75E 0333 AE37

--Tim Bielawa, Sr. Software Engineer [ED-C137]

    IRC: tbielawa (#openshift)
    1BA0 4FAB 4C13 FBA0 A036  4958 AD05 E75E 0333 AE37




--
Tim Bielawa, Sr. Software Engineer [ED-C137]
Cell: 919.332.6411  | IRC: tbielawa (#openshift)
1BA0 4FAB 4C13 FBA0 A036  4958 AD05 E75E 0333 AE37

_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Node not joining cluster during ansible install

Reply via email to