Russell,
      Thanks a lot for all detailed information. From the xcat.log and the 
fact that updatenode produced the correct result it looks like the 
corrected packages for IB were installed, but the module was not loaded 
when the postscript confignics was running. 
I checked the documentation 
https://sourceforge.net/p/xcat/wiki/Managing_the_Mellanox_Infiniband_Network1/ 
it looks like you need to add another postscript in order to have the 
Mellanox driver installed. Please check the section "Mellanox IB Drives 
Installation".
You need to add mlnxofed_ib_install to the postscript before confignics.

    Another way to try is to move confignics from postscripts to 
postbootscripts.

Hope it helps,

Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692 
External: [email protected], 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein 



From:   "Russell Auld" <[email protected]>
To:     "'xCAT Users Mailing list'" <[email protected]>
Date:   11/18/2014 04:57 PM
Subject:        Re: [xcat-user] using gateway=<xcatmaster> in networks 
table



I noticed that code in the script as well.
I reran the script via “updatenode node1 -P confignics” and it works as 
expected when you do that; The GATEWAY line is omitted from the ifcfg-ib0 
file.
However, a fresh installation of the OS (statefull) results in the file 
getting the incorrect gateway value.
I’ve posted the xcat.log below. You can see that there’s some kind of 
trouble when the configib script runs.
It looks like some of the IB modules aren’t initially loaded, so when the 
script brings down the driver stack and starts it, there are errors 
related to kernel modules. This looks like it’s injecting characters into 
the script and that’s causing the script to fail.
 
Thu Nov 13 23:25:45 EST 2014 Running postscript: remoteshell
 
Stopping sshd: [60G[[0;31mFAILED[0;39m]
Generating SSH1 RSA host key: [60G[[0;32m  OK  [0;39m]
Starting sshd: [60G[[0;32m  OK  [0;39m]
Thu Nov 13 23:25:47 EST 2014 Running postscript: syncfiles
Thu Nov 13 23:25:49 EST 2014 Running postscript: confignics
confignics on node1: config install nic:0, remove: 0, iba ports: 
NICIPS=ib0!198.18.9.55
confignics on node1: executed script: configib for nics: ib0, ports: 
nic_ibnics=ib0 nic_ibaports=
Low level hardware support loaded:
               mlx4_ib 
 
Upper layer protocol modules:
               ib_ipoib 
 
User space access modules:
               none found
 
Connection management modules:
               ib_cm 
 
Configured IPoIB interfaces: ib0 ib1
 
Currently active IPoIB interfaces: ib0 ib1 
Unloading OpenIB kernel modules:[60G[[0;32m  OK  [0;39m]
Loading OpenIB kernel modules:FATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_coreFATAL: Could not load 
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
 
Failed to load module ib_core[60G[[0;31mFAILED[0;39m]
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
/xcatpost/configib: line 486: syntax error near unexpected token `done'
/xcatpost/configib: line 486: `   done # end for nicip'
Thu Nov 13 23:25:52 EST 2014 Running postscript: hardeths
…
 
My packages list contains “@infiniband” – I also initially tried a shorter 
list of known required IB packages. In each case, the result was the same. 
If you notice, there are no user-space modules loaded when the postscript 
runs.
When the node boots later, the modules are loaded as shown below:
 
[root@node1 ~]# service rdma status
Low level hardware support loaded:
        mlx4_ib
 
Upper layer protocol modules:
        ib_ipoib
 
User space access modules:
        rdma_ucm ib_ucm ib_uverbs ib_umad
 
Connection management modules:
        rdma_cm ib_cm iw_cm
 
Configured IPoIB interfaces: none
Currently active IPoIB interfaces: none
 
 
From: Ling Gao [mailto:[email protected]] 
Sent: Thursday, November 13, 2014 12:13 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] using gateway=<xcatmaster> in networks table
 
Hmm, In the /install/postscripts/configib file, it has the following lines 
that should prevent "gateway" being written to the configuration file if 
it is <xcatmaster>.   
       if [ "$gateway" == "<xcatmaster>" ]; then 
           gateway='' 
       fi 
I checked the configuration for node1 and it seems all correct. 
Can you download the latest configib script from the following link and 
copy it to /install/postscripts directory on you management node and 
service nodes (if any).  And then run "updatenode node1 confignics" ?  See 
if it works for you. 
http://sourceforge.net/p/xcat/xcat-core/ci/master/tree/xCAT/postscripts/configib
 


Thanks, 

Ling

Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692 
External: [email protected], 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein 



From:        "Russell Auld" <[email protected]> 
To:        "'xCAT Users Mailing list'" <[email protected]> 
Date:        11/12/2014 10:24 PM 
Subject:        Re: [xcat-user] using gateway=<xcatmaster> in networks 
table 




[root@head ~]# lsdef node1
Object name: node1
   arch=x86_64
   bmc=node1-c
   bmcport=0
   currchain=boot
   currstate=boot
   groups=compute,all,ipmi,oa10
   initrd=xcat/osimage/rhel-x86_64-server-6-2014_q3/initrd.img
   installnic=eth0
   ip=10.30.116.254
   kcmdline=quiet repo=http://10.30.116.91:80/install/rhels6.4/x86_64 
ks=http://10.30.116.91:80/install/autoinst/node1 ksdevice=eth0
   kernel=xcat/osimage/rhel-x86_64-server-6-2014_q3/vmlinuz
   mac=D4:85:64:CC:3B:00
   mgt=ipmi
   netboot=pxe
   nfsserver=10.30.116.91
   nichostnamesuffixes.ib0=-ib
   nicips.ib0=198.18.8.254
   nicnetworks.ib0=infiniband
   nictypes.ib0=Infiniband
   os=rhels6.4
   otherinterfaces=node1-c:198.18.12.254
   postbootscripts=otherpkgs,panasas
   postscripts=remoteshell,syncfiles,confignics,hardeths
   primarynic=mac
   profile=compute
   provmethod=rhel-x86_64-server-6-2014_q3
   status=booted
   statustime=11-12-2014 08:48:26
   updatestatus=synced
   updatestatustime=11-12-2014 11:09:21

From: Ling Gao [mailto:[email protected]] 
Sent: Wednesday, November 12, 2014 2:45 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] using gateway=<xcatmaster> in networks table

Russell, 
   Can you show the node definition for the node you have deployed with 
this problem? 
lsdef <node> 

Thanks, 

Ling 


Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692 
External: [email protected], 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein 



From:        "Russell Auld" <[email protected]> 
To:        "'xCAT Users Mailing list'" <[email protected]> 
Date:        11/11/2014 09:21 PM 
Subject:        [xcat-user] using gateway=<xcatmaster> in networks table 
________________________________________



I have <xcatmaster> set as the value for gateway 
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,comments,disable
 

"infiniband","198.18.8.0","255.255.252.0","ib0","<xcatmaster>",,"198.18.8.91",,,,,,,,,,,,
 

When the node is provisioned, the ‘ifcfg-ib0’ file contains: 
DEVICE=ib0 
NM_CONTROLLED=no 
BOOTPROTO=none 
ONBOOT=yes 
IPADDR=198.18.8.254 
PREFIX=22 
GATEWAY=&lt;xcatmaster&gt; 
IPADDR=198.18.8.254 
NETMASK=255.255.252.0 
Is this a bug? The documentation says that <xcatmaster> is a valid value 
for ‘gateway’ 
[root@head autoinst]# lsxcatd -a 
Version 2.8.4 (git commit ded873f998a889c91a169d3f870efdbebfb66243, built 
Thu May 29 23:32:02 EDT 2014) 
This is a Management Node 
dbengine=SQLite------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk_______________________________________________

xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user


------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk

_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to