Any thoughts on how to fix this problem. In summary: the master and node
seem to startup OK, but the node does not join the cluster. More
details here:
https://github.com/OpenRiskNet/home/blob/master/openshift/simple-ansible.md
Tim
On 15/08/2017 15:16, Tim Bielawa wrote:
Discussion on-going in new github issue:
https://github.com/openshift/openshift-ansible/issues/5088
On Tue, Aug 15, 2017 at 8:44 AM, Tim Bielawa <tbiel...@redhat.com
<mailto:tbiel...@redhat.com>> wrote:
Tim,
Can you please provide more information? You full inventory would
be very useful right now for debugging. Feel free to mask your
hostnames if you wish. What I need to see to debug this further
are all the parameters you're setting in the [OSEv3] section and
applying to each host in [masters] and [nodes].
You will find my GPG public key fingerprint in my signature If you
wish to encrypt the inventory file instead
As for those two stalls you mentioned:
"Ensure OpenShift <THING> correctly rolls out (best-effort today)"
The delays you experienced are normal and expected. Those delays
are typically because the pod images were being downloaded to your
hosts. However, you showed your 'oc get nodes' output and I
noticed your master said "Ready,SchedulingDisabled". Because your
master is labeled as 'SchedulingDisabled' then your master should
*NOT* be running any pods. In which case that means it wasn't
downloading pod images.
Can you please provide the following information:
* The output from `oc get all` on your master
* The output `docker images` on your node *AND* your master
* Your complete inventory file. As I said before, feel free to
mask your hostnames or IPs if you prefer.
You logs would also be helpful. Ensure you run ansible-playbook
with the -vv option for extra verbosity. You can do this in two ways:
1) If you run the install again you can set:
log_path = /tmp/ansible.log
in the [defaults] section of your ansible.cfg file.
2) Alternatively you can capture the output of ansible using the
`tee` command like so:
ansible-playbook -vv -i <INVENTORY> ./playbooks/byo/config.yml
| tee /tmp/ansible.log
Again, if you wish to keep this information private, my GPG key is
in my signature. Short ID is 0333AE37.
Thanks!
On Tue, Aug 15, 2017 at 6:15 AM, Tim Dudgeon
<tdudgeon...@gmail.com <mailto:tdudgeon...@gmail.com>> wrote:
Thanks for the response, and sorry for the delay on my end -
I've been away for a week.
I ran through the process again and got the same result. On
the node it looks like the openshift services are running OK:
systemctl list-units --all | grep -i origin
origin-node.service loaded active running OpenShift Node
But from the master the node has not joined the cluster:
oc get nodes
NAME STATUS AGE VERSION
2c0e37ab-f41e-40f1-a466-a575c85823b6.priv.cloud.scaleway.com
<http://2c0e37ab-f41e-40f1-a466-a575c85823b6.priv.cloud.scaleway.com>
Ready,SchedulingDisabled 26m v1.6.1+5115d708d7
The install process seems to have gone OK. There were no
obvious errors, though it did twice stall at a point like this:
### TASK [openshift_hosted : Ensure OpenShift router
correctly rolls out (best-effort today)] ******************
But after waiting for about 5-10 mins it continued.
There were lot of 'skipping' messages during the install, but
no obvious errors. The output was huge and not captured to a
file, so I'd have to run it again to try to get a full log.
Any thoughts as to what is wrong?
Tim
On 04/08/2017 16:07, Tim Bielawa wrote:
(reposting: forgot to reply-all the first time)
Just based off of the number of tasks your summary says
completed I am not sure your installation actually completed
in full. I expect to see upwards of 1->2 thousand tasks.
A while back we changed node integration behavior such that
if a node fails to provision it does not stop your entire
installation. This is to ease the pain felt when provisioning
large (hundred+) node clusters.
<private node1 dns name> : ok=235 changed=56
unreachable=0 failed=0
That node did not fully install. Open a shell on that node
and check the openshift services. I'm willing to bet that
systemctl list-units --all | grep -i origin
would show the node service is not running. Find the name of
the node service and then examine the journal logs for that node
journalctl -x -u <node-service-name>
I think we (the openshift-ansible team) will want to add
detection of failed node integrations into our error summary
report in the future. Would you mind please opening an issue
for this on our github page with this information?
Thanks!
On Sun, Jul 30, 2017 at 10:57 AM, Tim Dudgeon
<tdudgeon...@gmail.com <mailto:tdudgeon...@gmail.com>> wrote:
I'm trying to get to grips with the advanced (Ansible)
installer.
Initially I'm trying to do something very simple, fire up
a cluster with one master and one node.
My inventory file looks like this:
[OSEv3:children]
masters
nodes
[OSEv3:vars]
ansible_ssh_user=root
openshift_hostname=<private master dns name>
openshift_master_cluster_hostname=<private master dns name>
openshift_master_cluster_public_hostname=<public master
dns name>
openshift_disable_check=docker_storage,memory_availability
openshift_deployment_type=origin
[masters]
<private master dns name>
[etcd]
<private master dns name>
[nodes]
<private master dns name>
<private node1 dns name>
I run:
ansible-playbook ~/openshift-ansible/playbooks/byo/config.yml
and (after a long time) it completes, without any
noticeable errors:
...
PLAY RECAP
*********************************************************************************************************************************************************
<private node1 dns name> : ok=235 changed=56
unreachable=0 failed=0
<private master dns name> : ok=623 changed=166
unreachable=0 failed=0
localhost : ok=12 changed=0
unreachable=0 failed=0
Both nodes seem to have been setup OK.
But when I look on the master node there is only the
master in the cluster, no second node:
oc get nodes
NAME STATUS AGE
<private master dns name> Ready,SchedulingDisabled 32m
and of course like this nothing can get scheduled.
Presumably the node should be added to the cluster, so
any ideas what is going wrong here?
Thanks
Tim
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
<mailto:users@lists.openshift.redhat.com>
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
<http://lists.openshift.redhat.com/openshiftmm/listinfo/users>
--
Tim Bielawa, Sr. Software Engineer [ED-C137]
IRC: tbielawa (#openshift)
1BA0 4FAB 4C13 FBA0 A036 4958 AD05 E75E 0333 AE37
--
Tim Bielawa, Sr. Software Engineer [ED-C137]
IRC: tbielawa (#openshift)
1BA0 4FAB 4C13 FBA0 A036 4958 AD05 E75E 0333 AE37
--
Tim Bielawa, Sr. Software Engineer [ED-C137]
Cell: 919.332.6411 | IRC: tbielawa (#openshift)
1BA0 4FAB 4C13 FBA0 A036 4958 AD05 E75E 0333 AE37
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users