Josh,

I don't doubt that you're on to something.  But if this is the case, it 
means my systems are missing some files, namely:

/tftpboot/xcat/nbk.x86_64
/tftpboot/xcat/nbfs.x86_64.gz

Can you tell me what RPM installed those files on your system?  They 
don't exist on mine, and even a 'yum provides' doesn't find them.


On 01/21/2014 11:51 AM, Josh Nielsen wrote:
> Hi Jonathan,
>
> It is my understanding, from extensive debugging and notes that I have
> taken about the xCAT netbooting process in the past, that xCAT uses a
> two-stage image deployment method. It will first come up with a more
> "generic" boot image (normally xnba or sometimes yaboot) which - when it
> contacts the xCAT headnode (or the node handling DHCP requests) - the
> headnode will then recognize the current image on the client that is
> sending requests to DHCP for further boot instructions, and will tell
> the client to then load another image based on the subnet and image type
> it is currently using. For example my headnode's /etc/dhcpd.conf file
> has an entry that looks like this:
>
> hared-network eth0 {
>    subnet 10.20.0.0 netmask 255.255.0.0 {
>      max-lease-time 43200;
>      min-lease-time 43200;
>      default-lease-time 43200;
>      next-server  10.20.0.1;
>      option log-servers 10.20.0.1;
>      option ntp-servers 10.20.0.1;
>      option domain-name "xxxxxxxxx";
>      option domain-name-servers  10.20.0.1;
>      if option user-class-identifier = "xNBA" and option
> client-architecture = 00:00 { #x86, xCAT Network Boot Agent
>         always-broadcast on;
>         filename = "http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16";;
>      } else if option user-class-identifier = "xNBA" and option
> client-architecture = 00:09 { #x86, xCAT Network Boot Agent
>         filename =
> "http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16.uefi";;
>      } else if option client-architecture = 00:00  { #x86
>        filename "xcat/xnba.kpxe";
>      } else if option vendor-class-identifier = "Etherboot-5.4"  { #x86
>        filename "xcat/xnba.kpxe";
>      } else if option client-architecture = 00:07 { #x86_64 uefi
>         filename "xcat/xnba.efi";
>      } else if option client-architecture = 00:09 { #x86_64 uefi
> alternative id
>         filename "xcat/xnba.efi";
>      } else if option client-architecture = 00:02 { #ia64
>         filename "elilo.efi";
>      } else if substring(filename,0,1) = null { #otherwise, provide
> yaboot if the client isn't specific
>         filename "/yaboot";
>      }
>      range dynamic-bootp 10.20.200.254 10.20.254.254;
>    } # 10.20.0.0/255.255.0.0 <http://10.20.0.0/255.255.0.0> subnet_end
>
> So if it boots with the xNBA image it then directs it to the
> http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16 which has the
> genesis boot instructions in it:
>
> #!gpxe
> imgfetch -n kernel
> http://${next-server}/tftpboot/xcat/genesis.kernel.x86_64 quiet
> xcatd=10.20.0.1:3001 <http://10.20.0.1:3001>  BOOTIF=01-${netX/machyp}
> imgfetch -n nbfs http://${next-server}/tftpboot/xcat/genesis.fs.x86_64.gz
> imgload kernel
> imgexec kernel
>
> So first it boots with xnba (first stage of boot), it contacts the DHCP
> server which gives it a "next-server" option of itself (saying to the
> client: request the next image from me - the headnode - again), and then
> gives it a boot file with instructions for the next image, then it
> executes it and finally loads genesis. You will also notice that the
> very last options (if it matches nothing else) is yaboot, which is
> another generic image, which will in turn probably request the next
> image. Try watching your log for the tftp daemon messages to see what is
> being sent.
>
> I wonder if you are having problems at the first stage DHCP redirecting
> stage though. Check your options statements in /etc/dhcpd.conf to see
> where it is directing xNBA images.
>
> Regards,
> Josh Nielsen
>
>
> On Tue, Jan 21, 2014 at 10:26 AM, Jonathan Mills <jonmi...@renci.org
> <mailto:jonmi...@renci.org>> wrote:
>
>     Wang,
>
>     Thank you for your response.  I did some digging and here is what I
>     found.
>
>     cat /tftpboot/xcat/xnba/nets/10.100.0.0_24
>     #!gpxe
>     imgfetch -n kernel
>     http://${next-server}/tftpboot/xcat/genesis.kernel.x86_64 quiet
>     xcatd=10.100.0.1:3001 <http://10.100.0.1:3001>  BOOTIF=01-${netX/machyp}
>     imgfetch -n nbfs
>     http://${next-server}/tftpboot/xcat/genesis.fs.x86_64.lzma
>     imgload kernel
>     imgexec kernel
>
>
>
>     cat /tftpboot/pxelinux.cfg/0A6400
>     DEFAULT xCAT
>         LABEL xCAT
>         KERNEL xcat/nbk.x86_64
>         APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.100.0.1:3001
>     <http://10.100.0.1:3001>
>
>
>
>     So, clearly, those things don't match up.  That strikes me as an xCAT
>     issue, but nevermind.  I manually modified /tftpboot/pxelinux.cfg/0A6400
>     to make it look like:
>
>     DEFAULT xCAT
>         LABEL xCAT
>         KERNEL xcat/genesis.kernel.x86_64
>         APPEND initrd=xcat/genesis.fs.x86_64.lzma quiet
>     xcatd=10.100.0.1:3001 <http://10.100.0.1:3001>
>     BOOTIF=eth0
>
>
>     (It is safe, in this case, to designate BOOTIF as 'eth0' -- with Cisco
>     UCS hardware, and using vNICs, the first interface will always show up
>     in Linux as eth0 -- at least, that is my experience).
>
>     After this change, I was indeed able to PXE boot the first node, and I
>     was hopeful that node discovery would then take place.  However, this
>     still did not occur.  On console, I dug into the running genesis image
>     on the first node, and I found that it had no ethernet interfaces
>     whatsoever, because the genesis kernel has no driver support for Cisco
>     UCS hardware.
>
>     For example, this is the ethtool output of a Cisco UCS vNIC:
>
>     [root@ncsu-hn nets]# ethtool -i eth0
>     driver: enic
>     version: 2.1.1.39
>     firmware-version: 2.0(4b)
>     bus-info: 0000:06:00.0
>     supports-statistics: yes
>     supports-test: no
>     supports-eeprom-access: no
>     supports-register-dump: no
>     supports-priv-flags: no
>
>
>     You can see it requires the 'enic' kernel module, usually located at:
>     /lib/modules/`uname -r`/kernel/drivers/net/enic/enic.ko
>
>     This module isn't found within the genesis image, so the node PXE boots,
>     and then can do no more.  Node discovery fails.
>
>     On 01/20/2014 09:19 PM, Xiao Peng Wang wrote:
>      > xCAT is using genesis (an xCAT customized pxe tool) to function the
>      > discovery process. The configuration for genesis is put in
>      > /tftpboot/xcat/xnba/nets/ for a specific network. Could you check
>     your
>      > specific xnba configuration file for your deployment network has been
>      > put in /tftpboot/xcat/xnba/nets/?
>      >
>      > The prerequisite for booting of genesis is to make the node has a
>      > dynamic IP address. Did you configure the dynamic IP range for your
>      > deployment network? Could you take a look of your syslog to see
>     whether
>      > the node has sent out dhcp request and what did your dhcp server
>     replied
>      > to them?
>      >
>      > Thanks
>      > Best Regards
>      >
>     ----------------------------------------------------------------------
>      > Wang Xiaopeng (王晓朋)
>      > IBM China System Technology Laboratory
>      > Tel: 86-10-82453455
>      > Email: w...@cn.ibm.com <mailto:w...@cn.ibm.com>
>      > Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
>      > Haidian District Beijing P.R.China 100193
>      >
>      > Inactive hide details for Jonathan Mills ---2014/01/19 06:24:02---I'm
>      > running xCAT 2.8.3 and CentOS 6.4 atop of Cisco UCS-C harJonathan
>     Mills
>      > ---2014/01/19 06:24:02---I'm running xCAT 2.8.3 and CentOS 6.4
>     atop of
>      > Cisco UCS-C hardware.  I'm  attempting to do a sequent
>      >
>      > From: Jonathan Mills <jonmi...@renci.org <mailto:jonmi...@renci.org>>
>      > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net
>     <mailto:xcat-user@lists.sourceforge.net>>,
>      > Date: 2014/01/19 06:24
>      > Subject: [xcat-user] Frustrating time with sequential node discovery
>      >
>      >
>     ------------------------------------------------------------------------
>      >
>      >
>      >
>      > I'm running xCAT 2.8.3 and CentOS 6.4 atop of Cisco UCS-C
>     hardware.  I'm
>      > attempting to do a sequential nodediscovery.  I've pre-populated the
>      > nodelist table with the nodenames, so I shouldn't need to do anything
>      > more than
>      >
>      > nodediscoverystart noderange=node[1-15]
>      >
>      > However, none of the nodes ever gets discovered.
>      >
>      > Digging deeper, it seems that none of them ever successfully PXE
>     boot at
>      > all.  They should be PXE booting off of the genesis netboot image and
>      > speaking back to the xcatmaster, correct?
>      >
>      > When I run 'mknb x86_64', it populates /tftpboot/pxelinux.cfg with
>      > entries to non-existent netboot images.  Watch:
>      >
>      > [root@ncsu-hn ~]# rpm -qf /opt/xcat/sbin/mknb
>      > xCAT-client-2.8.3-snap201311122316.noarch
>      > [root@ncsu-hn ~]# mknb x86_64
>      > Creating genesis.fs.x86_64.lzma in /tftpboot/xcat
>      > [root@ncsu-hn ~]# cd /tftpboot/pxelinux.cfg/
>      > [root@ncsu-hn pxelinux.cfg]# ls
>      > 0A6400  0A6500  0A6600  7F  98300D  98300DE6  98300DE7  C0A86B
>      > [root@ncsu-hn pxelinux.cfg]# cat *
>      > DEFAULT xCAT
>      >    LABEL xCAT
>      >    KERNEL xcat/nbk.x86_64
>      >    APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.100.0.1:3001
>     <http://10.100.0.1:3001>
>      > DEFAULT xCAT
>      >    LABEL xCAT
>      >    KERNEL xcat/nbk.x86_64
>      >    APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.101.0.1:3001
>     <http://10.101.0.1:3001>
>      > DEFAULT xCAT
>      >    LABEL xCAT
>      >    KERNEL xcat/nbk.x86_64
>      >    APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=10.102.0.1:3001
>     <http://10.102.0.1:3001>
>      > DEFAULT xCAT
>      >    LABEL xCAT
>      >    KERNEL xcat/nbk.x86_64
>      >    APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=127.0.0.1:3001
>     <http://127.0.0.1:3001>
>      > DEFAULT xCAT
>      >    LABEL xCAT
>      >    KERNEL xcat/nbk.x86_64
>      >    APPEND initrd=xcat/nbfs.x86_64.gz quiet xcatd=152.48.13.3:3001
>     <http://152.48.13.3:3001>
>      > DEFAULT xCAT
>      >    LABEL xCAT
>      >    KERNEL xcat/nbk.x86_64
>      >    APPEND initrd=xcat/nbfs.x86_64.gz quiet
>     xcatd=152.48.13.230:3001 <http://152.48.13.230:3001>
>      > DEFAULT xCAT
>      >    LABEL xCAT
>      >    KERNEL xcat/nbk.x86_64
>      >    APPEND initrd=xcat/nbfs.x86_64.gz quiet
>     xcatd=152.48.13.231:3001 <http://152.48.13.231:3001>
>      > DEFAULT xCAT
>      >    LABEL xCAT
>      >    KERNEL xcat/nbk.x86_64
>      >    APPEND initrd=xcat/nbfs.x86_64.gz quiet
>     xcatd=192.168.107.10:3001 <http://192.168.107.10:3001>
>      > [root@ncsu-hn pxelinux.cfg]# cd ../xcat/
>      > [root@ncsu-hn xcat]# ls -la
>      > total 21528
>      > drwxr-xr-x  4 root root     4096 Jan 17 13:06 .
>      > drwxr-xr-x. 7 root root     4096 Jan 18 22:02 ..
>      > -rwxr-xr-x  1 root root   242929 Jan 15  2012 elilo-x64.efi
>      > -rw-r--r--  1 root root 17573621 Jan 18 22:03 genesis.fs.x86_64.lzma
>      > -rwxr-xr-x  1 root root  3986608 Aug  9 06:29 genesis.kernel.x86_64
>      > drwxr-xr-x  3 root root     4096 Jan 17 13:06 osimage
>      > drwxr-xr-x  3 root root     4096 Dec 23 07:42 xnba
>      > -rw-r--r--  1 root root   139200 Oct 28 16:16 xnba.efi
>      > -rw-r--r--  1 root root    74792 Oct 28 16:16 xnba.kpxe
>      >
>      >
>      >
>      > As you can see....it ought to be netbooting the genesis kernel, but
>      > instead all my pxelinux.cfg/* files are instructing clients to
>     boot the
>      > non-existent "nbk.x86_64" image.
>      >
>      > Your advice is appreciated.
>      >
>      > --
>      > Jonathan Mills
>      > Systems Administrator
>      > Renaissance Computing Institute
>      > UNC-Chapel Hill
>      >
>      >
>     
> ------------------------------------------------------------------------------
>      > CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>      > Learn Why More Businesses Are Choosing CenturyLink Cloud For
>      > Critical Workloads, Development Environments & Everything In Between.
>      > Get a Quote or Start a Free Trial Today.
>      >
>     
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>      > _______________________________________________
>      > xCAT-user mailing list
>      > xCAT-user@lists.sourceforge.net
>     <mailto:xCAT-user@lists.sourceforge.net>
>      > https://lists.sourceforge.net/lists/listinfo/xcat-user
>      >
>      >
>
>     --
>     Jonathan Mills
>     Systems Administrator
>     Renaissance Computing Institute
>     UNC-Chapel Hill
>
>     
> ------------------------------------------------------------------------------
>     CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>     Learn Why More Businesses Are Choosing CenturyLink Cloud For
>     Critical Workloads, Development Environments & Everything In Between.
>     Get a Quote or Start a Free Trial Today.
>     
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>     _______________________________________________
>     xCAT-user mailing list
>     xCAT-user@lists.sourceforge.net <mailto:xCAT-user@lists.sourceforge.net>
>     https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>

-- 
Jonathan Mills
Systems Administrator
Renaissance Computing Institute
UNC-Chapel Hill

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to