Anything in /var/log/confluent/stderr or /var/log/confluent/trace? Also would
be tempted to see if 'confluent_selfcheck' has any suggestions. You can also
ssh into the node during that phase to confirm what it is doing while it is
seemingly hung, e.g. looking at ps axf
________________________________
From: David Magda <[email protected]>
Sent: Wednesday, January 24, 2024 9:37 PM
To: [email protected] <[email protected]>
Subject: [External] [xcat-user] Ansible and Confluent
Hello,
I'm trying to get Ansible working with Confluent 3.8.0. (Using an older version
due to legacy OS reasons.)
In /var/lib/confluent/public/os/ I created a new profile called
ubuntu-22.04.3-x86_64-test1/, and this seems to work just fine: I took the
provided "autoinstall/user-data" file, added some partition stanzas, some
packages, etc.
Once I sorted out a 'basic' automated Ubuntu install I tried creating a
"ansible/post.d/01-packages.yaml" file with-in the profile directory with the
following contents:
"""
- name: install chrony
apt:
pkg:
- chrony
"""
The Ubuntu (subiquity) installer seems to 'hang' at:
"""
start: subiquity/Late/run/command_1: /custom-installation/post.sh
"""
which probably corresponds to this part of the "user-data" file:
"""
late-commands:
- chroot /target apt-get -y -q purge snapd modemmanager
- /custom-installation/post.sh
"""
When the 'hang' occurs the following starts filling up the
"/var/log/httpd/ssl_access_log" file of the Confluent/xcat server:
"""
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
fe80::[EUI-64] - - [24/Jan/2024:11:15:08 -0500] "GET
/confluent-api/self/remoteconfig/status HTTP/1.1" 200 -
"""
When I force a restart of the system/VM, it can boot off the disk, and goes
through the regular start-up process, including a bunch of cloud-init stuff.
Though after it runs "/etc/confluent/firstboot.sh", the "ssl_access_log" file
once again starts filling with the "remoteconfig/status" stuff per above.
Renaming "ansible/" to "ansible_off/" seems to make the problem go away.
Similar behaviour with Ubuntu 20.04.
I'm wondering what's going with the 'hang' when "post.sh" is executed, and the
flooding after "firstboot.sh".
Regards,
David
_______________________________________________
xCAT-user mailing list
[email protected]
https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Fxcat-user&data=05%7C02%7Cjjohnson2%40lenovo.com%7C1a071e27a40c447e020208dc1d50acd8%7C5c7d0b28bdf8410caa934df372b16203%7C0%7C0%7C638417479688016346%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C60000%7C%7C%7C&sdata=rjezz0DVeivcDm%2FQyUPGNj1CPft3hI381qfEn%2BKPHkA%3D&reserved=0<https://lists.sourceforge.net/lists/listinfo/xcat-user>
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user