on the two second dhcp server, can u run `makedhcp -d 00:25:90:5a:eb:8a`  ?
we want to remove this mac address from the dhcp lease file on the server
130.246.32.86

Can u also check /var/log/console/proc01.log and search what is "Next
server"  ?


Thanks,
Casandra Qiu

...................................................................
Casandra Hong Qiu
Phone: (845) 433-9291, t/l 293-9291
Office: Building 8, 3-B-04
cxh...@us.ibm.com





From:   "Chiu, Peter (STFC,RAL,RALSP)" <peter.c...@stfc.ac.uk>
To:     xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Date:   08/05/2020 04:04 PM
Subject:        [EXTERNAL] Re: [xcat-user] xCAT 2.16 Centos 7 clients PXE Boot
            Aborted after 3.10.0-1127.18.2 upgrade



Hello Casandra,

Thanks for reply response.

Below are the output of your suggestions.

We do have a second DHCP server, but due to the small number of compute
nodes,
We have restricted our master server to serve a few nodes:  130.246.32.141
– 130.246.32.155.
I have attached a copy of the dhcpd.conf here.

I did enter the chdef command, and restarted the client.
It still fails with PXE Boot aborted.  Don’t think it has got far enough to
allow me to ssh in.

Any further thoughts, thanks.

Regards,
Peter
      1.     Xcatprobe catmn –I bond0

      [root@main ~]# netstat –nr    # to determine interface
      Kernel IP routing table
      Destination     Gateway         Genmask         Flags   MSS Window
      irtt Iface
      0.0.0.0         130.246.32.254  0.0.0.0         UG        0 0
      0 bond0
      130.246.32.0    0.0.0.0         255.255.252.0   U         0 0
      0 bond0
      172.17.0.0      0.0.0.0         255.255.0.0     U         0 0
      0 docker0

      [root@main ~]# xcatprobe xcatmn -i bond0
      [mn]: Checking all xCAT daemons are running...
      [ OK ]
      [mn]: Checking xcatd can receive command request...
      [ OK ]
      [mn]: Checking 'site' table is configured...
      [ OK ]
      [mn]: Checking provision network is configured...
      [ OK ]
      [mn]: Checking 'passwd' table is configured...
      [ OK ]
      [mn]: Checking important directories(installdir,tftpdir) are
      configured...[ OK ]
      [mn]: Checking SELinux is disabled...
      [ OK ]
      [mn]: Checking HTTP service is configured...
      [ OK ]
      [mn]: Checking TFTP service is configured...
      [ OK ]
      [mn]: Checking DNS service is configured...
      [ OK ]
      [mn]: Checking DHCP service is configured...
      [ OK ]
      [mn]: Checking NTP service is configured...
      [ OK ]
      [mn]: Checking rsyslog service is configured...
      [ OK ]
      [mn]: Checking firewall is disabled...
      [ OK ]
      [mn]: Checking minimum disk space for xCAT ['/var' needs
      1GB;'/install'...[ OK ]
      [mn]: Checking Linux ulimits configuration...
      [ OK ]
      [mn]: Checking network kernel parameter configuration...
      [ OK ]
      [mn]: Checking xCAT daemon attributes configuration...
      [ OK ]
      [mn]: Checking xCAT log is stored in /var/log/xcat/cluster.log...
      [ OK ]
      [mn]: Checking xCAT management node IP: <130.246.32.140> is
      configured ...[ OK ]
      [mn]: Checking dhcpd.leases file is less than 100M...
      [ OK ]
      =================================== SUMMARY
      ===========================...
      [MN]: Checking on MN...
      [ OK ]
      [root@main ~]# df -hl /var
      Filesystem      Size  Used Avail Use% Mounted on
      /dev/sdc7       207G   80G  117G  41% /

      2.     Xcatprobe detect_dhcpd –i bond0 –m 00:25:90:5a:eb:8a

      [root@main ~]# xcatprobe detect_dhcpd -i bond0 -m 00:25:90:5a:eb:8a
      Start to detect DHCP, please wait 10 seconds
      [INFO]
      ++++++++++++++++++++++++++++++++++
      [INFO]
      There are 2 servers replied to dhcp discover.
      [INFO]
          Server:130.246.32.140 assign IP [130.246.32.141]. The next server
      i...[INFO]
          Server:130.246.32.100 assign IP [130.246.32.86]. The next server
      is...[INFO]
      ++++++++++++++++++++++++++++++++++
      [INFO]
      [root@main ~]#

      3.     The client host is registered in dns:
      [root@main ~]# nslookup proc01
      Server:         130.246.188.240
      Address:        130.246.188.240#53

      Name:   proc01.bnsc.rl.ac.uk
      Address: 130.246.32.141

      4.     Did try the chdef –t site clustersite xcatdebugmode=2
            [root@main ~]# chdef -t site clustersite xcatdebugmode=2
1 object definitions have been created or modified.

             Restart the client proc01, but still fails with PXE Boot
aborted.
             Client has not response to ping.

[root@main ~]# ping 130.246.32.141
--- 130.246.32.141 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2000ms

From: Casandra H Qiu <cxh...@us.ibm.com>
Sent: 05 August 2020 20:15
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Subject: Re: [xcat-user] xCAT 2.16 Centos 7 clients PXE Boot Aborted after
3.10.0-1127.18.2 upgrade



maybe try some verification command:
xcatprobe xcatmn -i <provision interface>
xcatprobe detect_dhcpd -i <provision interface> -m <CN's mac address>
<<<<------ this is important, make sure no other server service this mac
address
nslookup <name of CN>

you also can turn on debug mode, you may able to ssh to CN in the anaconda
mode to do more debug
chdef -t site clustersite xcatdebugmode=2

Thanks,
Casandra Qiu
...................................................................
Casandra Hong Qiu
Phone: (845) 433-9291, t/l 293-9291
Office: Building 8, 3-B-04
cxh...@us.ibm.com



Inactive hide details for "Chiu, Peter (STFC,RAL,RALSP)" ---08/05/2020
05:12:20 AM---Hello all, Having enjoyed a smooth running"Chiu, Peter
(STFC,RAL,RALSP)" ---08/05/2020 05:12:20 AM---Hello all, Having enjoyed a
smooth running of xCAT 2.16 on a small cluster of 4 compute nodes for ab

From: "Chiu, Peter (STFC,RAL,RALSP)" <peter.c...@stfc.ac.uk>
To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net>
Date: 08/05/2020 05:12 AM
Subject: [EXTERNAL] [xcat-user] xCAT 2.16 Centos 7 clients PXE Boot Aborted
after 3.10.0-1127.18.2 upgrade




Hello all,

Having enjoyed a smooth running of xCAT 2.16 on a small cluster of 4
compute nodes for about 18 months, I have hit a problem on the last Centos
7 system update from 3.10.0-1127.13.1 to 3.10.0-1127.18.2 that resulted in
all the clients not booting up with this error:

CLIENT MAC ADDR: 00 25 90 5A EB BA GUID: 00000000 0025905AEBBA
CLIENT IP: 130.246.32.141 MASK: 255.25.252.0 DNCP IP: 130.246.32.140
GATEWAY IP: 130.246.32.254

PXE Boot aborted. Booting to next device...
PXE-M0F: Exiting Intel Boot Agent

No such problem with the previous Centos 7 updates.
Last successful update to 3.10.0-1127.13.1 on 24 June went through okay.

I have attempted a number of checks and recoveries but no joy:
a. confirm master can accept dhcp requests with DHCPNAK records in
messages.
b. confirm master can accept tftp downloads from a different system.
c. confirm master can accept http downloads on kernel and ramdisk files.
d. power-cycled master
e. power-cycled clients
f. manually lsdef -t osimage, chdef -t osimage, genimage, packimage,
nodeset

Unfortunately the above fault persists on all four compute nodes that are
still down.

I think I have run out of ideas.
Before giving up and making a fresh XCAT installation, I wonder if anyone
can shed some clues to trouble shoot PXE aborted errors.

Many thanks.
Peter Chiu
STFC RAL Space, UK
==============================================================================

Here are some details on the systems:

Master node: main.bnsc.rl.ac.uk 130.246.32.140/22 gateway 130.246.32.254
Compute node1: proc01.bnsc.rl.ac.uk 130.246.32.141/22 00:25:90:5a:eb:8a
Operating system: CentOS Linux release 7.8.2003 (Core)
xCAT: # rpm -qf /opt/xcat/sbin/xcatd
xCAT-server-2.16-snap202006161607.noarch

Checks:

a. DHCP records in master /var/log/messages, no error.

The master server has picked up the dhcp requests, and offered the address.
But no further communication afterwards.

Aug 4 15:00:27 main dhcpd: DHCPDISCOVER from 00:25:90:5a:eb:8a via bond0
Aug 4 15:00:27 main dhcpd: DHCPOFFER on 130.246.32.141 to 00:25:90:5a:eb:8a
via bond0
Aug 4 15:00:29 main dhcpd: Dynamic and static leases present for
130.246.32.141.
Aug 4 15:00:29 main dhcpd: Remove host declaration proc01 or remove
130.246.32.141
Aug 4 15:00:29 main dhcpd: from the dynamic address pool for bond0
Aug 4 15:00:29 main dhcpd: DHCPREQUEST for 130.246.32.141 (130.246.32.140)
from 00:25:90:5a:eb:8a via bond0
Aug 4 15:00:29 main dhcpd: DHCPACK on 130.246.32.141 to 00:25:90:5a:eb:8a
via bond0

b. /var/log/xcat/cluster.log
No errors, just a record of a new image produced.

Aug 4 14:24:54 main xcat[28101]: INFO xCAT: Allowing lsdef -t site -o
clustersite -i installdir for root from localhost
Aug 4 14:24:54 main xcat[28103]: INFO xCAT: Allowing genimage -i eth0 -n
dca,ixgbe,igb,e1000e,e1000,tg3 -o centos7.6 -p compute
--tempfile /tmp/xcat_genimage.28086 for root from localhost
Aug 4 14:27:29 main xcat[25483]: INFO xCAT: Allowing packimage
centos7.6-x86_64-netboot-compute for root from localhost
Aug 4 14:27:30 main xcat[25499]: INFO xCAT: Allowing ilitefile
centos7.6-x86_64-statelite-compute for root from localhost
Aug 4 14:30:07 main xcat[26073]: INFO xCAT: Allowing nodeset to compute
osimage=centos7.6-x86_64-netboot-compute for root from localhost
Aug 4 14:34:33 main xcat[26958]: INFO xCAT: Allowing rpower to compute
reset for root from localhost
Aug 4 14:34:33 main xcat[26959]: INFO xcat.updatestatus - proc03: changing
status=powering-on
Aug 4 14:34:33 main xcat[26959]: INFO xcat.updatestatus - proc04: changing
status=powering-on
Aug 4 14:34:33 main xcat[26959]: INFO xcat.updatestatus - proc01: changing
status=powering-on
Aug 4 14:34:33 main xcat[26959]: INFO xcat.updatestatus - proc02: changing
status=powering-on

c. Check dhcp lease file for the files to be downloaded:
less /var/lib/dhcpd/dhcpd.leases
host proc01.bnsc.rl.ac.uk {
deleted;
}
host proc04.bnsc.rl.ac.uk {
deleted;
}
host proc01 {
dynamic;
hardware ethernet 00:25:90:5a:eb:8a;
uid 00:25:90:5a:eb:8a;
fixed-address 130.246.32.141;
supersede server.ddns-hostname = "proc01";
supersede host-name = "proc01";
if option user-class-identifier = "xNBA" and option client-architecture
= 00:00 {
supersede server.always-broadcast = 01;
supersede server.filename =
"http://${next-server}:80/tftpboot/xcat/xnba/nodes/proc01";;
} elsif option user-class-identifier = "xNBA" and option
client-architecture = 00:09 {
supersede server.filename =
"http://${next-server}:80/tftpboot/xcat/xnba/nodes/proc01.uefi";;
} elsif option client-architecture = 00:07 {
supersede server.filename = "xcat/xnba.efi";
} elsif option client-architecture = 00:00 {
supersede server.filename = "xcat/xnba.kpxe";
} else {
supersede server.filename = "";
}
}

Follow through this list to download the files on a separate Centos server.

d. tftp 130.236.32.140
[root@cds1 xcat]# tftp 130.246.32.140
tftp> get xcat/xnba.kpxe
tftp> get xcat/xnba.efi
tftp> get yaboot
tftp> get xcat/xnba/nets/130.246.32.0_22
tftp> get xcat/xnba/nets/130.246.32.0_22.uefi
tftp> quit
[root@cds1 xcat]# ls
130.246.32.0_22 130.246.32.0_22.uefi elilo.efi xnba.efi xnba.kpxe yaboot
[root@cds1 xcat]# ls -ls
total 536
4 -rw-r--r-- 1 root root 252 Aug 4 09:46 130.246.32.0_22
4 -rw-r--r-- 1 root root 116 Aug 4 09:46 130.246.32.0_22.uefi
0 -rw-r--r-- 1 root root 0 Aug 4 09:45 elilo.efi
140 -rw-r--r-- 1 root root 139169 Aug 4 09:45 xnba.efi
80 -rw-r--r-- 1 root root 74786 Aug 4 09:45 xnba.kpxe
308 -rw-r--r-- 1 root root 310187 Aug 4 09:46 yaboot

e. use wget to download the node start up file
wget http://130.246.32.140:80/tftpboot/xcat/xnba/nodes/proc01

root@cds1 xcat]# wget
http://130.246.32.140:80/tftpboot/xcat/xnba/nodes/proc01
--2020-08-04 11:57:18--
http://130.246.32.140/tftpboot/xcat/xnba/nodes/proc01
Connecting to 130.246.32.140:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 528
Saving to: `proc01'

100%[======================================>] 528 --.-K/s in 0s

2020-08-04 11:57:18 (85.2 MB/s) - `proc01' saved [528/528]

f. This file in turn contains the instructions to download the kernel and
ramdisk
[root@cds1 xcat]# less proc01
#!gpxe
#netboot centos7.6-x86_64-compute
imgfetch -n kernel http://$
{next-server}:80/tftpboot/xcat/osimage/centos7.6-x86_64-netboot-compute/kernel

imgload kernel
imgargs kernel imgurl=
http://130.246.32.140:80//install/netboot/centos7.6/x86_64/compute/rootimg.cpio.gz
 XCAT=130.246.32.140:3001 NODE=proc01 FC=yes XCATHTTPPORT=80 netdev=eth0
selinux=0 biosdevname=0 net.ifnames=0 BOOTIF=01-${netX/machyp}
imgfetch http://$
{next-server}:80/tftpboot/xcat/osimage/centos7.6-x86_64-netboot-compute/initrd-stateless.gz

imgexec kernel

Both the kernel and ramdisk can also be downloaded using wget command.


This email and any attachments are intended solely for the use of the named
recipients. If you are not the intended recipient you must not use,
disclose, copy or distribute this email or any of its attachments and
should notify the sender immediately and delete this email from your
system. UK Research and Innovation (UKRI) has taken every reasonable
precaution to minimise risk of this email or any attachments containing
viruses or malware but the recipient should carry out its own virus and
malware checks before opening the attachments. UKRI does not accept any
liability for any losses or damages which the recipient may sustain due to
presence of any viruses. Opinions, conclusions or other information in this
message and attachments that are not related directly to UKRI business are
solely those of the author and do not represent the views of UKRI.
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


 [attachment "dhcpd.conf" deleted by Casandra H Qiu/Poughkeepsie/IBM]
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_xcat-2Duser&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=n1LR_Py9TQX0dVqfGTbLHUMGx25-C8VtBDS0nCzyNXY&m=Zwf0mfiEv7ic1xaNcTVdOahR8f0f8_jB3vfABDDpTJg&s=jin-73XzXZYxPCYuE6pyJO6IvCJBrvnKLoAXZ9VJZMw&e=







_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to