Hi Thomas,
There's a pattern that I've found. When the compute node is simple enough it
works, probably da data for resolv.conf is fetched directly from DHPC which
should be configured correctly.
The issue is around the nodes that have custom network schemes, like bonds and
VLANs; it's something wrong during the confignetwork postscript. Probably due
to a configuration mistake that I've made but I don't know which one.
Regarding your questions:
1) It does not exist
[root@ceph01-ib0 ~]# systemctl status systemd-networkd
Unit systemd-networkd.service could not be found.
2) It's running
[root@ceph01-ib0 ~]# systemctl status NetworkManager
● NetworkManager.service - Network Manager
Loaded: loaded (/usr/lib/systemd/system/NetworkManager.service; enabled;
vendor preset: enabled)
Active: active (running) since Mon 2021-06-14 13:37:20 -03; 8min ago
Docs: man:NetworkManager(8)
Main PID: 2028 (NetworkManager)
Tasks: 3 (limit: 2464038)
Memory: 11.4M
CGroup: /system.slice/NetworkManager.service
└─2028 /usr/sbin/NetworkManager --no-daemon
3) It does not exist:
[root@ceph01-ib0 ~]# ls -l /etc/resolv.conf
-rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf
[root@ceph01-ib0 ~]# ls -l /run/systemd/resolv/resolv.conf
ls: cannot access '/run/systemd/resolv/resolv.conf': No such file or directory
Cannot find anything related to rc-manager, is this a systemd thing?
4) No it's not.
[root@ceph01-ib0 ~]# ls -l /etc/resolv.conf
-rw-r--r-- 1 root root 65 Jun 14 13:37 /etc/resolv.conf
5) Seems default to me
[root@ceph01-ib0 ~]# grep host /etc/nsswitch.conf
# Valid databases are: aliases, ethers, group, gshadow, hosts,
# myhostname Use systemd host names
hosts: files dns myhostname
That's it.
It's probably something messy with confignetwork script, but not sure what.
Thanks,
On 14 Jun 2021, at 07:57, Thomas HUMMEL
<[email protected]<mailto:[email protected]>> wrote:
On 14/06/2021 07:41, Vinícius Ferrão via xCAT-user wrote:
Hello,
For unknown reasons nodes that I've installed with rinstall (using stateful
method) didn't get the nameserver section in resolv.conf, basically leaving the
node without any name resolution.
Hello,
assuming it is not an xCAT bug, I would look at
1) if systemd-networkd is enabled
2) if NetworkManager is enabled
3) if b) if it handles /etc/resolv.conf by looking at its conf and
a) is dns= stated ?
b) is /etc/resolv.conf a symlink to /run/systemd/resolv/resolv.conf ?
c) is rc-manager stated ?
4) is /etc/resolv.conf a symlink to ../run/resolvconf/resolv.conf ?
5) the host line of /etc/nsswitch.conf
to figure out who manages /etc/resolv.conf
Hope it helps.
--
Thomas HUMMEL
rc-manager=
As specified on the documentation
https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html<https://urldefense.com/v3/__https://xcat-docs.readthedocs.io/en/stable/advanced/domain_name_resolution/domain_name_resolution.html__;!!JFdNOqOXpB6UZW0!91ZLw8JQX3n5Rscdto49z3zhxcPMupJEn1wtuLVOZFrMI5loio5BEgk3-82bVMwzYliuCA$>;
it should be generated it nameservers and domain are provided on the site
table: The resolv.conf files for the compute nodes will be created
automatically using the domain and nameservers values set in the xCAT network
or site definition.
Both are defined but it still didn't generate it correctly.
[root@headnode ~]# lsdef -t site clustersite | egrep "nameserver|forward|domain"
domain=cluster.domain.tld
forwarders=1.1.1.1
nameservers=172.26.255.254
I even tried adding the nameservers to the network definition, but it was a no
go:
[root@headnode ~]# lsdef -t network management
Object name: management
gateway=<xcatmaster>
mask=255.255.0.0
mgtifname=bond0
mtu=1500
nameservers=172.26.255.254
net=172.26.0.0
tftpserver=<xcatmaster>
Is there anything that I can do to debug this?
Thanks,
Vinícius.
PS: Here's full data from a given node and the networks.
[root@headnode ~]# lsdef ceph01
Object name: ceph01
arch=x86_64
bmc=172.25.254.1
bmcpassword=calvin
bmcusername=root
cons=ipmi
consoleenabled=1
currchain=boot
currstate=install ol8.4.0-x86_64-compute
groups=ceph,all
ip=172.26.254.1
mac=bc:97:e1:ea:08:b0
mgt=ipmi
netboot=xnba
nicdevices.bond0.123=bond0
nicdevices.bond0.1010=bond0
nicdevices.bond0=ens1f0np0|ens1f1np1
nichostnamesuffixes.bond0.1010=-ceph
nichostnamesuffixes.bond0.123=-cephsync
nicips.ib0=172.27.254.1
nicips.bond0=172.26.254.1
nicips.bond0.1010=10.0.10.21
nicips.bond0.123=192.168.168.21
nicnetworks.bond0.123=ceph-sync
nicnetworks.ib0=application
nicnetworks.bond0.1010=ceph
nicnetworks.bond0=management
nictypes.ib0=Infiniband
nictypes.ens1f0np0=ethernet
nictypes.bond0.1010=vlan
nictypes.bond0=bond
nictypes.ens1f1np1=ethernet
nictypes.bond0.123=vlan
os=ol8.4.0
postbootscripts=otherpkgs,confignics
postscripts=syslog,remoteshell,syncfiles,confignetwork,versatushpc/postinstall-ceph
profile=compute
provmethod=ol8.4.0-x86_64-install-ceph
serialport=0
serialspeed=115200
status=booted
statustime=06-14-2021 02:37:04
updatestatus=synced
updatestatustime=06-14-2021 02:01:55
[root@headnode ~]# lsdef -t network
application (network)
ceph (network)
ceph-sync (network)
libvirt (network)
management (network)
service (network)
site (network)
_______________________________________________
xCAT-user mailing list
[email protected]<mailto:[email protected]>
https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/xcat-user__;!!JFdNOqOXpB6UZW0!91ZLw8JQX3n5Rscdto49z3zhxcPMupJEn1wtuLVOZFrMI5loio5BEgk3-82bVMxD4UfdFg$
_______________________________________________
xCAT-user mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/xcat-user
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user