Josh, I don't doubt that you're on to something. But if this is the case, it means my systems are missing some files, namely:
/tftpboot/xcat/nbk.x86_64 /tftpboot/xcat/nbfs.x86_64.gz Can you tell me what RPM installed those files on your system? They don't exist on mine, and even a 'yum provides' doesn't find them. On 01/21/2014 11:51 AM, Josh Nielsen wrote: > Hi Jonathan, > > It is my understanding, from extensive debugging and notes that I have > taken about the xCAT netbooting process in the past, that xCAT uses a > two-stage image deployment method. It will first come up with a more > "generic" boot image (normally xnba or sometimes yaboot) which - when it > contacts the xCAT headnode (or the node handling DHCP requests) - the > headnode will then recognize the current image on the client that is > sending requests to DHCP for further boot instructions, and will tell > the client to then load another image based on the subnet and image type > it is currently using. For example my headnode's /etc/dhcpd.conf file > has an entry that looks like this: > > hared-network eth0 { > subnet 10.20.0.0 netmask 255.255.0.0 { > max-lease-time 43200; > min-lease-time 43200; > default-lease-time 43200; > next-server 10.20.0.1; > option log-servers 10.20.0.1; > option ntp-servers 10.20.0.1; > option domain-name "xxxxxxxxx"; > option domain-name-servers 10.20.0.1; > if option user-class-identifier = "xNBA" and option > client-architecture = 00:00 { #x86, xCAT Network Boot Agent > always-broadcast on; > filename = "http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16"; > } else if option user-class-identifier = "xNBA" and option > client-architecture = 00:09 { #x86, xCAT Network Boot Agent > filename = > "http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16.uefi"; > } else if option client-architecture = 00:00 { #x86 > filename "xcat/xnba.kpxe"; > } else if option vendor-class-identifier = "Etherboot-5.4" { #x86 > filename "xcat/xnba.kpxe"; > } else if option client-architecture = 00:07 { #x86_64 uefi > filename "xcat/xnba.efi"; > } else if option client-architecture = 00:09 { #x86_64 uefi > alternative id > filename "xcat/xnba.efi"; > } else if option client-architecture = 00:02 { #ia64 > filename "elilo.efi"; > } else if substring(filename,0,1) = null { #otherwise, provide > yaboot if the client isn't specific > filename "/yaboot"; > } > range dynamic-bootp 10.20.200.254 10.20.254.254; > } # 10.20.0.0/255.255.0.0 <http://10.20.0.0/255.255.0.0> subnet_end > > So if it boots with the xNBA image it then directs it to the > http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16 which has the > genesis boot instructions in it: > > #!gpxe > imgfetch -n kernel > http://${next-server}/tftpboot/xcat/genesis.kernel.x86_64 quiet > xcatd=10.20.0.1:3001 <http://10.20.0.1:3001> BOOTIF=01-${netX/machyp} > imgfetch -n nbfs http://${next-server}/tftpboot/xcat/genesis.fs.x86_64.gz > imgload kernel > imgexec kernel > > So first it boots with xnba (first stage of boot), it contacts the DHCP > server which gives it a "next-server" option of itself (saying to the > client: request the next image from me - the headnode - again), and then > gives it a boot file with instructions for the next image, then it > executes it and finally loads genesis. You will also notice that the > very last options (if it matches nothing else) is yaboot, which is > another generic image, which will in turn probably request the next > image. Try watching your log for the tftp daemon messages to see what is > being sent. > > I wonder if you are having problems at the first stage DHCP redirecting > stage though. Check your options statements in /etc/dhcpd.conf to see > where it is directing xNBA images. > > Regards, > Josh Nielsen > > > On Tue, Jan 21, 2014 at 10:26 AM, Jonathan Mills <jonmi...@renci.org > <mailto:jonmi...@renci.org>> wrote: > > Wang, > > Thank you for your response. I did some digging and here is what I > found. > > cat /tftpboot/xcat/xnba/nets/10.100.0.0_24 > #!gpxe > imgfetch -n kernel > http://${next-server}/tftpboot/xcat/genesis.kernel.x86_64 quiet > xcatd=10.100.0.1:3001 <http://10.100.0.1:3001> BOOTIF=01-${netX/machyp} > imgfetch -n nbfs > http://${next-server}/tftpboot/xcat/genesis.fs.x86_64.lzma > imgload kernel > imgexec kernel > > > > cat /tftpboot/pxelinux.cfg/0A6400 > DEFAULT xCAT > LABEL xCAT > KERNEL xcat/nbk.x86_64 > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.100.0.1:3001 > <http://10.100.0.1:3001> > > > > So, clearly, those things don't match up. That strikes me as an xCAT > issue, but nevermind. I manually modified /tftpboot/pxelinux.cfg/0A6400 > to make it look like: > > DEFAULT xCAT > LABEL xCAT > KERNEL xcat/genesis.kernel.x86_64 > APPEND initrd=xcat/genesis.fs.x86_64.lzma quiet > xcatd=10.100.0.1:3001 <http://10.100.0.1:3001> > BOOTIF=eth0 > > > (It is safe, in this case, to designate BOOTIF as 'eth0' -- with Cisco > UCS hardware, and using vNICs, the first interface will always show up > in Linux as eth0 -- at least, that is my experience). > > After this change, I was indeed able to PXE boot the first node, and I > was hopeful that node discovery would then take place. However, this > still did not occur. On console, I dug into the running genesis image > on the first node, and I found that it had no ethernet interfaces > whatsoever, because the genesis kernel has no driver support for Cisco > UCS hardware. > > For example, this is the ethtool output of a Cisco UCS vNIC: > > [root@ncsu-hn nets]# ethtool -i eth0 > driver: enic > version: 2.1.1.39 > firmware-version: 2.0(4b) > bus-info: 0000:06:00.0 > supports-statistics: yes > supports-test: no > supports-eeprom-access: no > supports-register-dump: no > supports-priv-flags: no > > > You can see it requires the 'enic' kernel module, usually located at: > /lib/modules/`uname -r`/kernel/drivers/net/enic/enic.ko > > This module isn't found within the genesis image, so the node PXE boots, > and then can do no more. Node discovery fails. > > On 01/20/2014 09:19 PM, Xiao Peng Wang wrote: > > xCAT is using genesis (an xCAT customized pxe tool) to function the > > discovery process. The configuration for genesis is put in > > /tftpboot/xcat/xnba/nets/ for a specific network. Could you check > your > > specific xnba configuration file for your deployment network has been > > put in /tftpboot/xcat/xnba/nets/? > > > > The prerequisite for booting of genesis is to make the node has a > > dynamic IP address. Did you configure the dynamic IP range for your > > deployment network? Could you take a look of your syslog to see > whether > > the node has sent out dhcp request and what did your dhcp server > replied > > to them? > > > > Thanks > > Best Regards > > > ---------------------------------------------------------------------- > > Wang Xiaopeng (王晓朋) > > IBM China System Technology Laboratory > > Tel: 86-10-82453455 > > Email: w...@cn.ibm.com <mailto:w...@cn.ibm.com> > > Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, > > Haidian District Beijing P.R.China 100193 > > > > Inactive hide details for Jonathan Mills ---2014/01/19 06:24:02---I'm > > running xCAT 2.8.3 and CentOS 6.4 atop of Cisco UCS-C harJonathan > Mills > > ---2014/01/19 06:24:02---I'm running xCAT 2.8.3 and CentOS 6.4 > atop of > > Cisco UCS-C hardware. I'm attempting to do a sequent > > > > From: Jonathan Mills <jonmi...@renci.org <mailto:jonmi...@renci.org>> > > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net > <mailto:xcat-user@lists.sourceforge.net>>, > > Date: 2014/01/19 06:24 > > Subject: [xcat-user] Frustrating time with sequential node discovery > > > > > ------------------------------------------------------------------------ > > > > > > > > I'm running xCAT 2.8.3 and CentOS 6.4 atop of Cisco UCS-C > hardware. I'm > > attempting to do a sequential nodediscovery. I've pre-populated the > > nodelist table with the nodenames, so I shouldn't need to do anything > > more than > > > > nodediscoverystart noderange=node[1-15] > > > > However, none of the nodes ever gets discovered. > > > > Digging deeper, it seems that none of them ever successfully PXE > boot at > > all. They should be PXE booting off of the genesis netboot image and > > speaking back to the xcatmaster, correct? > > > > When I run 'mknb x86_64', it populates /tftpboot/pxelinux.cfg with > > entries to non-existent netboot images. Watch: > > > > [root@ncsu-hn ~]# rpm -qf /opt/xcat/sbin/mknb > > xCAT-client-2.8.3-snap201311122316.noarch > > [root@ncsu-hn ~]# mknb x86_64 > > Creating genesis.fs.x86_64.lzma in /tftpboot/xcat > > [root@ncsu-hn ~]# cd /tftpboot/pxelinux.cfg/ > > [root@ncsu-hn pxelinux.cfg]# ls > > 0A6400 0A6500 0A6600 7F 98300D 98300DE6 98300DE7 C0A86B > > [root@ncsu-hn pxelinux.cfg]# cat * > > DEFAULT xCAT > > LABEL xCAT > > KERNEL xcat/nbk.x86_64 > > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.100.0.1:3001 > <http://10.100.0.1:3001> > > DEFAULT xCAT > > LABEL xCAT > > KERNEL xcat/nbk.x86_64 > > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.101.0.1:3001 > <http://10.101.0.1:3001> > > DEFAULT xCAT > > LABEL xCAT > > KERNEL xcat/nbk.x86_64 > > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.102.0.1:3001 > <http://10.102.0.1:3001> > > DEFAULT xCAT > > LABEL xCAT > > KERNEL xcat/nbk.x86_64 > > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=127.0.0.1:3001 > <http://127.0.0.1:3001> > > DEFAULT xCAT > > LABEL xCAT > > KERNEL xcat/nbk.x86_64 > > APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=152.48.13.3:3001 > <http://152.48.13.3:3001> > > DEFAULT xCAT > > LABEL xCAT > > KERNEL xcat/nbk.x86_64 > > APPEND initrd=xcat/nbfs.x86_64.gz quiet > xcatd=152.48.13.230:3001 <http://152.48.13.230:3001> > > DEFAULT xCAT > > LABEL xCAT > > KERNEL xcat/nbk.x86_64 > > APPEND initrd=xcat/nbfs.x86_64.gz quiet > xcatd=152.48.13.231:3001 <http://152.48.13.231:3001> > > DEFAULT xCAT > > LABEL xCAT > > KERNEL xcat/nbk.x86_64 > > APPEND initrd=xcat/nbfs.x86_64.gz quiet > xcatd=192.168.107.10:3001 <http://192.168.107.10:3001> > > [root@ncsu-hn pxelinux.cfg]# cd ../xcat/ > > [root@ncsu-hn xcat]# ls -la > > total 21528 > > drwxr-xr-x 4 root root 4096 Jan 17 13:06 . > > drwxr-xr-x. 7 root root 4096 Jan 18 22:02 .. > > -rwxr-xr-x 1 root root 242929 Jan 15 2012 elilo-x64.efi > > -rw-r--r-- 1 root root 17573621 Jan 18 22:03 genesis.fs.x86_64.lzma > > -rwxr-xr-x 1 root root 3986608 Aug 9 06:29 genesis.kernel.x86_64 > > drwxr-xr-x 3 root root 4096 Jan 17 13:06 osimage > > drwxr-xr-x 3 root root 4096 Dec 23 07:42 xnba > > -rw-r--r-- 1 root root 139200 Oct 28 16:16 xnba.efi > > -rw-r--r-- 1 root root 74792 Oct 28 16:16 xnba.kpxe > > > > > > > > As you can see....it ought to be netbooting the genesis kernel, but > > instead all my pxelinux.cfg/* files are instructing clients to > boot the > > non-existent "nbk.x86_64" image. > > > > Your advice is appreciated. > > > > -- > > Jonathan Mills > > Systems Administrator > > Renaissance Computing Institute > > UNC-Chapel Hill > > > > > > ------------------------------------------------------------------------------ > > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > > Learn Why More Businesses Are Choosing CenturyLink Cloud For > > Critical Workloads, Development Environments & Everything In Between. > > Get a Quote or Start a Free Trial Today. > > > > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > > _______________________________________________ > > xCAT-user mailing list > > xCAT-user@lists.sourceforge.net > <mailto:xCAT-user@lists.sourceforge.net> > > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > > > -- > Jonathan Mills > Systems Administrator > Renaissance Computing Institute > UNC-Chapel Hill > > > ------------------------------------------------------------------------------ > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net <mailto:xCAT-user@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/xcat-user > > -- Jonathan Mills Systems Administrator Renaissance Computing Institute UNC-Chapel Hill ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user