Russell,
Thanks a lot for all detailed information. From the xcat.log and the
fact that updatenode produced the correct result it looks like the
corrected packages for IB were installed, but the module was not loaded
when the postscript confignics was running.
I checked the documentation
https://sourceforge.net/p/xcat/wiki/Managing_the_Mellanox_Infiniband_Network1/
it looks like you need to add another postscript in order to have the
Mellanox driver installed. Please check the section "Mellanox IB Drives
Installation".
You need to add mlnxofed_ib_install to the postscript before confignics.
Another way to try is to move confignics from postscripts to
postbootscripts.
Hope it helps,
Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: [email protected], 845-433-5692
"I never worry about the future. It comes soon enough." --- Albert
Einstein
From: "Russell Auld" <[email protected]>
To: "'xCAT Users Mailing list'" <[email protected]>
Date: 11/18/2014 04:57 PM
Subject: Re: [xcat-user] using gateway=<xcatmaster> in networks
table
I noticed that code in the script as well.
I reran the script via “updatenode node1 -P confignics” and it works as
expected when you do that; The GATEWAY line is omitted from the ifcfg-ib0
file.
However, a fresh installation of the OS (statefull) results in the file
getting the incorrect gateway value.
I’ve posted the xcat.log below. You can see that there’s some kind of
trouble when the configib script runs.
It looks like some of the IB modules aren’t initially loaded, so when the
script brings down the driver stack and starts it, there are errors
related to kernel modules. This looks like it’s injecting characters into
the script and that’s causing the script to fail.
Thu Nov 13 23:25:45 EST 2014 Running postscript: remoteshell
Stopping sshd: [60G[[0;31mFAILED[0;39m]
Generating SSH1 RSA host key: [60G[[0;32m OK [0;39m]
Starting sshd: [60G[[0;32m OK [0;39m]
Thu Nov 13 23:25:47 EST 2014 Running postscript: syncfiles
Thu Nov 13 23:25:49 EST 2014 Running postscript: confignics
confignics on node1: config install nic:0, remove: 0, iba ports:
NICIPS=ib0!198.18.9.55
confignics on node1: executed script: configib for nics: ib0, ports:
nic_ibnics=ib0 nic_ibaports=
Low level hardware support loaded:
mlx4_ib
Upper layer protocol modules:
ib_ipoib
User space access modules:
none found
Connection management modules:
ib_cm
Configured IPoIB interfaces: ib0 ib1
Currently active IPoIB interfaces: ib0 ib1
Unloading OpenIB kernel modules:[60G[[0;32m OK [0;39m]
Loading OpenIB kernel modules:FATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_coreFATAL: Could not load
/lib/modules/2.6.32-358.el6.x86_64/modules.dep: No such file or directory
Failed to load module ib_core[60G[[0;31mFAILED[0;39m]
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
/xcatpost/configib: line 486: syntax error near unexpected token `done'
/xcatpost/configib: line 486: ` done # end for nicip'
Thu Nov 13 23:25:52 EST 2014 Running postscript: hardeths
…
My packages list contains “@infiniband” – I also initially tried a shorter
list of known required IB packages. In each case, the result was the same.
If you notice, there are no user-space modules loaded when the postscript
runs.
When the node boots later, the modules are loaded as shown below:
[root@node1 ~]# service rdma status
Low level hardware support loaded:
mlx4_ib
Upper layer protocol modules:
ib_ipoib
User space access modules:
rdma_ucm ib_ucm ib_uverbs ib_umad
Connection management modules:
rdma_cm ib_cm iw_cm
Configured IPoIB interfaces: none
Currently active IPoIB interfaces: none
From: Ling Gao [mailto:[email protected]]
Sent: Thursday, November 13, 2014 12:13 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] using gateway=<xcatmaster> in networks table
Hmm, In the /install/postscripts/configib file, it has the following lines
that should prevent "gateway" being written to the configuration file if
it is <xcatmaster>.
if [ "$gateway" == "<xcatmaster>" ]; then
gateway=''
fi
I checked the configuration for node1 and it seems all correct.
Can you download the latest configib script from the following link and
copy it to /install/postscripts directory on you management node and
service nodes (if any). And then run "updatenode node1 confignics" ? See
if it works for you.
http://sourceforge.net/p/xcat/xcat-core/ci/master/tree/xCAT/postscripts/configib
Thanks,
Ling
Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: [email protected], 845-433-5692
"I never worry about the future. It comes soon enough." --- Albert
Einstein
From: "Russell Auld" <[email protected]>
To: "'xCAT Users Mailing list'" <[email protected]>
Date: 11/12/2014 10:24 PM
Subject: Re: [xcat-user] using gateway=<xcatmaster> in networks
table
[root@head ~]# lsdef node1
Object name: node1
arch=x86_64
bmc=node1-c
bmcport=0
currchain=boot
currstate=boot
groups=compute,all,ipmi,oa10
initrd=xcat/osimage/rhel-x86_64-server-6-2014_q3/initrd.img
installnic=eth0
ip=10.30.116.254
kcmdline=quiet repo=http://10.30.116.91:80/install/rhels6.4/x86_64
ks=http://10.30.116.91:80/install/autoinst/node1 ksdevice=eth0
kernel=xcat/osimage/rhel-x86_64-server-6-2014_q3/vmlinuz
mac=D4:85:64:CC:3B:00
mgt=ipmi
netboot=pxe
nfsserver=10.30.116.91
nichostnamesuffixes.ib0=-ib
nicips.ib0=198.18.8.254
nicnetworks.ib0=infiniband
nictypes.ib0=Infiniband
os=rhels6.4
otherinterfaces=node1-c:198.18.12.254
postbootscripts=otherpkgs,panasas
postscripts=remoteshell,syncfiles,confignics,hardeths
primarynic=mac
profile=compute
provmethod=rhel-x86_64-server-6-2014_q3
status=booted
statustime=11-12-2014 08:48:26
updatestatus=synced
updatestatustime=11-12-2014 11:09:21
From: Ling Gao [mailto:[email protected]]
Sent: Wednesday, November 12, 2014 2:45 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] using gateway=<xcatmaster> in networks table
Russell,
Can you show the node definition for the node you have deployed with
this problem?
lsdef <node>
Thanks,
Ling
Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: [email protected], 845-433-5692
"I never worry about the future. It comes soon enough." --- Albert
Einstein
From: "Russell Auld" <[email protected]>
To: "'xCAT Users Mailing list'" <[email protected]>
Date: 11/11/2014 09:21 PM
Subject: [xcat-user] using gateway=<xcatmaster> in networks table
________________________________________
I have <xcatmaster> set as the value for gateway
#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,comments,disable
"infiniband","198.18.8.0","255.255.252.0","ib0","<xcatmaster>",,"198.18.8.91",,,,,,,,,,,,
When the node is provisioned, the ‘ifcfg-ib0’ file contains:
DEVICE=ib0
NM_CONTROLLED=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=198.18.8.254
PREFIX=22
GATEWAY=<xcatmaster>
IPADDR=198.18.8.254
NETMASK=255.255.252.0
Is this a bug? The documentation says that <xcatmaster> is a valid value
for ‘gateway’
[root@head autoinst]# lsxcatd -a
Version 2.8.4 (git commit ded873f998a889c91a169d3f870efdbebfb66243, built
Thu May 29 23:32:02 EDT 2014)
This is a Management Node
dbengine=SQLite------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user