Hello, Before going into more debugging, I checked whether the problem was still there. This time, even though nodestat did not give any information about the rpms that are being installed, I checked /tftpboot/xcat/xnba/nodes/n105 instead of overwriting it with a nodeset and I saw it was changed to boot on local disk.
Sorry for the misleading information, I must have corrected my mistake (I
don't know which) somewhere and thought the problem was still there
because nodestat was not changed during rpm installation. But NOW n105
gets installed properly and reboots to local disk, thank you for your
help.
I still do not understand why nodestat is not updated when rpms are
installed, I guess it has to do with foo.awk and bar.awk.
I have used another shell to check /tmp/foo.awk and /tmp/bar.awk.during
rpm install
I have found there was no /mnt/var/log/Yast2/y2logRPM (actually not a
single file in the Yast2 directory)
/tmp/bar.awk returns an error message : fatal must supply a remote port to
'/inet'
I guess in the following lines of /install/autoinst/n105 xcatdport should
be changed to xcatiport (I tried modifying
/opt/xcat/share/xcat/install/scripts/pre.sles line 17, but it did not
solve my problem, may be it was improperly done )
cat >/tmp/bar.awk <<EOF
#!$AWK -f
BEGIN {
xcatdport = "3002"
xcatdhost = "172.29.101.2"
ns = "/inet/tcp/0/" xcatdhost "/" xcatiport
...
Thank you for your help.
Best regards,
Antoine Tabary
17 Avenue De L'europe
Certified HPC I/T Specialist
Bois Colombes Cedex, 92275
0233AA
France
ITS
e-mail:
[email protected]
From: Linda Mellor <[email protected]>
To: xCAT Users Mailing list <[email protected]>
Date: 16/11/2011 14:34
Subject: Re: [xcat-user] Node keeps reinstalling
When the node is installing, the script that should change the state from
network boot to local disk boot is /xcatpost/updateflag.awk. This will
change the node "status" attribute to booting, and will change the
/tftpboot/etc/n105 file to fail the network boot so that the node will
then attempt the local disk boot.
One experiment would be when you run through the install, run nodeset n105
boot to stop the recursive install like you did before. After the node has
booted up, do the following:
on MN: nodeset n105 install ### this will reset the /tftpboot/etc file for
the node
chdef n105 status=TEST
on n105: /xcatpost/updateflag.awk 172.29.101.2 3002
on the MN: lsdef n105 -i status,statustime #### you should see status
changed to "booting" and the time value updated
cat /tftpboot/etc/n105 ### should look something like:
#boot
timeout=5
bye
If all of that is correct, then we need to look harder at your
/install/autoinst/n105 file to see what else might be failing during the
"post" section. Maybe put some breadcrumb logging in there to get a sense
of what is not working. Since you are seeing /xcatpost and
/tmp/mypostscript getting created with valid information, communication
with the xCAT management node and the xcatd daemon seems to be working at
least somewhat.
Linda
Lissa Valletta---11/16/2011 06:51:43 AM---Could you also give me an
lsxcatd -a on the Management Node. and a tabdump site. Are you usi
From: Lissa Valletta/Poughkeepsie/IBM@IBMUS
To: xCAT Users Mailing list <[email protected]>
Cc: xCAT Users Mailing list <[email protected]>
Date: 11/16/2011 06:51 AM
Subject: Re: [xcat-user] Node keeps reinstalling
Could you also give me an lsxcatd -a on the Management Node. and a
tabdump site. Are you using Service Nodes? Also, is this your first
install of xCAT and first trying to install nodes? Did you upgrade from
a previous release?
Give me the output of rpm -qa | grep xCAT on the MN and if you have
Service Nodes do the same there.
Thanks!
Lissa K. Valletta
2-3/T12
Poughkeepsie, NY 12601
(tie 293) 433-3102
From: Antoine Tabary <[email protected]>
To: xCAT Users Mailing list <[email protected]>
Date: 11/16/2011 04:12 AM
Subject: Re: [xcat-user] Node keeps reinstalling
Hello,
Some more information.
1/ I was wrong when I said postscripts were not run, they are run.
2/ During install nodestat keeps saying "installing prep" which confirms
that for one reason or another the node does not communicate its status to
xcatd
Here is /tmp/mypostscript on n105 :
n105:~ # cat /tmp/mypostscript
BLADEMAXP='64'
export BLADEMAXP
DOMAIN='dsi.upmc.fr'
export DOMAIN
FSPTIMEOUT='0'
export FSPTIMEOUT
INSTALLDIR='/install'
export INSTALLDIR
IPMIMAXP='64'
export IPMIMAXP
IPMIRETRIES='3'
export IPMIRETRIES
IPMITIMEOUT='2'
export IPMITIMEOUT
CONSOLEONDEMAND='yes'
export CONSOLEONDEMAND
SITEMASTER=172.29.101.2
export SITEMASTER
MASTER=172.29.101.2
export MASTER
FORWARDERS='134.157.1.23,134.157.0.29'
export FORWARDERS
NAMESERVERS='172.29.101.2'
export NAMESERVERS
MAXSSH='8'
export MAXSSH
PPCMAXP='64'
export PPCMAXP
PPCRETRY='3'
export PPCRETRY
PPCTIMEOUT='0'
export PPCTIMEOUT
SHAREDTFTP='1'
export SHAREDTFTP
SNSYNCFILEDIR='/var/xcat/syncfiles'
export SNSYNCFILEDIR
TFTPDIR='/tftpboot'
export TFTPDIR
XCATDPORT='3001'
export XCATDPORT
XCATIPORT='3002'
export XCATIPORT
XCATCONFDIR='/etc/xcat'
export XCATCONFDIR
TIMEZONE='Europe/Paris'
export TIMEZONE
USENMAPFROMMN='no'
export USENMAPFROMMN
ENABLEASMI='no'
export ENABLEASMI
DB2INSTALLLOC='/mntdb2'
export DB2INSTALLLOC
DATABASELOC='/var/lib'
export DATABASELOC
SSHBETWEENNODES='ALLGROUPS'
export SSHBETWEENNODES
DNSHANDLER='ddns'
export DNSHANDLER
VSFTP='y'
export VSFTP
CLEANUPXCATPOST='no'
export CLEANUPXCATPOST
ENABLESSHBETWEENNODES=YES
export ENABLESSHBETWEENNODES
NODE=n105
export NODE
NFSSERVER=172.29.101.2
export NFSSERVER
PRIMARYNIC=eth0
export PRIMARYNIC
OSVER=sles11.1
export OSVER
ARCH=x86_64
export ARCH
PROFILE=compute
export PROFILE
PATH=`dirname $0`:$PATH
export PATH
NODESETSTATE='boot'
export NODESETSTATE
UPDATENODE=0
export UPDATENODE
NTYPE=compute
export NTYPE
MACADDRESS='E4:1F:13:4D:35:88'
export MACADDRESS
MONSERVER=chou-mgmt
export MONSERVER
MONMASTER=172.29.101.2
export MONMASTER
OSPKGS='@base,@x11,xntp,rsync'
export OSPKGS
# postscripts-start-here
syslog
remoteshell
syncfiles
fsmnt
gpfsinst
# postscripts-end-here
n105:~ #
And here is lsdef (I have done a "nodeset n105 boot" while n105 was
installing rpms to stop n105 from always reinstalling):
chou-mgmt:~ # lsdef n105
Object name: n105
arch=x86_64
bmc=n105-bmc
bmcport=0
currchain=boot
currstate=boot
groups=ipminocons,frame1,compute,all
initrd=xcat/sles11.1/x86_64/initrd
kcmdline=autoyast=http://172.29.101.2/install/autoinst/n105
install=http://172.29.101.2/install/sles11.1/x86_64/1 netdevice=eth0
kernel=xcat/sles11.1/x86_64/linux
mac=E4:1F:13:4D:35:88
mgt=ipmi
netboot=xnba
nfsserver=172.29.101.2
os=sles11.1
postbootscripts=otherpkgs,dhcpno
postscripts=syslog,remoteshell,syncfiles,fsmnt,gpfsinst
primarynic=eth0
profile=compute
provmethod=install
status=booted
statustime=11-16-2011 09:44:29
switch=switch2
switchport=5
chou-mgmt:~ #
Thank you for your help. Best regards,
Antoine 17 Avenue De L'europe (Embedded
Tabary image moved
to file:
pic32189.gif)
Certified Bois Colombes Cedex,
HPC I/T 92275
Specialist
0233AA France
ITS
e-mail: [email protected]
From: Lissa Valletta <[email protected]>
To: xCAT Users Mailing list <[email protected]>
Cc: xCAT Users Mailing list <[email protected]>
Date: 15/11/2011 19:58
Subject: Re: [xcat-user] Node keeps reinstalling
Could you give me the entire contents of /tmp/mypostscript file from the
node?
Also give lsdef <nodename>
Lissa K. Valletta
2-3/T12
Poughkeepsie, NY 12601
(tie 293) 433-3102
From: Antoine Tabary <[email protected]>
To: xCAT Users Mailing list
<[email protected]>
Date: 11/15/2011 11:10 AM
Subject: Re: [xcat-user] Node keeps reinstalling
Hello,
The MASTER variable is OK, but the block :
# subroutine used to run postscripts
...
# subroutine end
is missing
Any idea ?
Thank you for your help. Best regards,
Antoine 17 Avenue De L'europe (Embedded
Tabary image moved
to file:
pic38825.gif)
Certified Bois Colombes Cedex,
HPC I/T 92275
Specialist
0233AA France
ITS
e-mail: [email protected]
From: Lissa Valletta <[email protected]>
To: xCAT Users Mailing list <[email protected]>
Cc: xCAT Users Mailing list <[email protected]>
Date: 15/11/2011 13:39
Subject: Re: [xcat-user] Node keeps reinstalling
Make sure that the site.master attribute is an ip address of hostname
that
will be resolvable during the install. The usual problem is the node
cannot contact the MN to tell it, it is finished.
On the node you can look at the /tmp/mypostscript file.
It should look something like the following. The last line is important,
because that is the we are finished line. But it cannot be sent to the
MN
unless $MASTER is an address that is resolvable on the node at that point.
If this file is empty, it is a pretty good sign that the node could not
contact the MN after install.
# subroutine used to run postscripts
.
.
.
}
# subroutine end
BLADEMAXP='64'
export BLADEMAXP
DOMAIN='cluster.com'
export DOMAIN
FSPTIMEOUT='0'
export FSPTIMEOUT
.
.
MASTER=10.16.0.103
export MASTER
.
# postscripts-start-here
run_ps remoteshell
run_ps syncfiles
run_ps syslog
run_ps new_set
run_ps setbootfromnet
# postscripts-end-here
# postbootscripts-start-here
run_ps otherpkgs
# postbootscripts-end-here
updateflag.awk $MASTER 3002 "installstatus booted"
Lissa K. Valletta
2-3/T12
Poughkeepsie, NY 12601
(tie 293) 433-3102
From: Antoine Tabary <[email protected]>
To: xCAT Users Mailing list
<[email protected]>
Date: 11/15/2011 05:45 AM
Subject: [xcat-user] Node keeps reinstalling
Hello,
We are tryoing to install an iDataplex cluster with SLES11 nodes.
The /tftpboot/xcat/xnba/nodes/n001 is not reset during installation so the
node keeps reinstalling itself. unless we type "nodeset n101 boot" before
it reboots. Any suggestion on what we have not configured ?
It does not run postscripts nor postbootscripts either.
We are running xcat 2.6.8 with xcat-dep-201111041626
Thank you for your help. Best regards,
Antoine 17 Avenue De L'europe (Embedded
Tabary image moved
to file:
pic40165.gif)
Certified Bois Colombes Cedex,
HPC I/T 92275
Specialist
0233AA France
ITS
e-mail: [email protected]
Sauf indication contraire ci-dessus:/ Unless stated otherwise above:
Compagnie IBM France
Siège Social : 17 avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 639.291.962.10 ?
SIREN/SIRET : 552 118 465 03644 - Code NAF 6202A
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
[attachment "pic40165.gif" deleted by Antoine Tabary/France/IBM]
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
Sauf indication contraire ci-dessus:/ Unless stated otherwise above:
Compagnie IBM France
Siège Social : 17 avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 639.291.962.10 ?
SIREN/SIRET : 552 118 465 03644 - Code NAF 6202A
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
[attachment "pic38825.gif" deleted by Antoine Tabary/France/IBM]
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
Sauf indication contraire ci-dessus:/ Unless stated otherwise above:
Compagnie IBM France
Siège Social : 17 avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 639.291.962.10 ?
SIREN/SIRET : 552 118 465 03644 - Code NAF 6202A
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
[attachment "pic32189.gif" deleted by Linda Mellor/Poughkeepsie/IBM]
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
Sauf indication contraire ci-dessus:/ Unless stated otherwise above:
Compagnie IBM France
Siège Social : 17 avenue de l'Europe, 92275 Bois-Colombes Cedex
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 639.291.962.10 ?
SIREN/SIRET : 552 118 465 03644 - Code NAF 6202A <<image/gif>>
<<image/gif>>
------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d
_______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user
