Russell,
A couple things to try:
1. xcatmaster=evxcat, you do not need to set up xcatmaster for a node
if there is no service node involved. It default to site.master. I
usually put an ip address that is facing the node for the site.master.
2. Regarding the pping error, (not sure why pping is called). By
default pping calls nmap. nmap sometimes times out and gives error
for slow networks. You can set a site table attribute to let it use
fping instead. site.usefping=1. Make sure fping is installed on the
management node.
Ling
Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: [email protected], 845-433-5692
"I never worry about the future. It comes soon enough." --- Albert
Einstein
From: Russell Jones <[email protected]>
To: xCAT Users Mailing list <[email protected]>
Date: 02/05/2015 02:03 PM
Subject: Re: [xcat-user] xCAT 2.9 Syncfiles error
------------------------------------------------------------------------
Here you go. There is no service node, just the management node (evxcat).
[root@evxcat ~]# lsdef n1
Object name: n1
arch=x86_64
currstate=netboot centos6.6-x86_64-v6.6.0-dl-compute-apu
groups=all,compute,v6.6.0-dl-compute-apu
initrd=xcat/osimage/v6.6.0-dl-compute-apu/initrd-stateless.gz
ip=172.21.1.2
kcmdline=imgurl=http://evxcat:80//install/netboot/centos6.6/x86_64/v6.6.0-dl-compute-apu/rootimg.gz
XCAT=evxcat:3001 NODE=n1 FC=0 ifname=eth0:74:D4:35:9B:CF:FE netdev=eth0
kernel=xcat/osimage/v6.6.0-dl-compute-apu/kernel
mac=74:D4:35:9B:CF:FE
mgt=ipmi
netboot=xnba
nichostnamesuffixes.eth0=-eth0
nichostnamesuffixes.ib0=-ib0
nicips.ib0=172.40.130.1
nicips.eth0=172.21.1.2
nicnetworks.ib0=172_40_0_0_Infiniband
nicnetworks.eth0=172_21_1_0_APU
nictypes.ib0=Infiniband
nictypes.eth0=Ethernet
os=centos6.6
postscripts=remoteshell
profile=v6.6.0-dl-compute-apu
provmethod=v6.6.0-dl-compute-apu
status=booted
statustime=02-04-2015 13:43:56
updatestatus=synced
updatestatustime=02-04-2015 13:32:41
xcatmaster=evxcat
[root@evxcat ~]# lsdef -t osimage v6.6.0-dl-compute-apu
Object name: v6.6.0-dl-compute-apu
groups=all
imagetype=linux
nodebootif=eth0
osarch=x86_64
osdistroname=centos6.6-x86_64
osname=Linux
osvers=centos6.6
otherpkgdir=/install/custom/v6.6.0-dl-compute-apu/otherpkgs
otherpkglist=/install/custom/v6.6.0-dl-compute-apu/v6.6.0-dl-compute-apu.otherpkgs.pkglist
permission=755
pkgdir=/install/centos6.6/x86_64
pkglist=/install/custom/v6.6.0-dl-compute-apu/v6.6.0-dl-compute-apu.pkglist
postbootscripts=otherpkgs
postinstall=/install/custom/v6.6.0-dl-compute-apu/v6.6.0-dl-compute-apu.postinstall
postscripts=confignics,syslog,syncfiles,setupntp,addsiteyum,mount_lustre2.sh,mount_home.sh,enable_slurm.sh
profile=v6.6.0-dl-compute-apu
provmethod=netboot
rootimgdir=/install/netboot/centos6.6/x86_64/v6.6.0-dl-compute-apu
synclists=/install/custom/v6.6.0-dl-compute-apu/v6.6.0-dl-compute-apu.synclist
On 05.02.2015 12:52, Ling Gao wrote:
Russell,
Could you send me the output of lsdef n1? Is there a service node
involve in the cluster?
Thanks,
Ling
Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: [email protected], 845-433-5692
"I never worry about the future. It comes soon enough." --- Albert
Einstein
From: Russell Jones <[email protected]>
To: xCAT Users Mailing list <[email protected]>
Date: 02/05/2015 10:44 AM
Subject: Re: [xcat-user] xCAT 2.9 Syncfiles error
------------------------------------------------------------------------
Hi Ling,
I'm confused too! Here are answers to your questions, thanks for your
help!
> Is this diskless or diskful deployment? When you say "reboot", did
you mean redeploy?
This is a diskless node netbooting a CentOS 6.6 x64 image.
> Could you grep /var/log/messages for "Allowing xdcp" and show me the
output.
That message seems to only show up when doing a packimage. When the
node is booting it shows "Allowing syncfiles". Here's the output when
that is happening. Note this is recorded on the management node as I
have the syslog postscript being ran on the node before syncfiles.
Also note that it seems to have tried to run Syncfiles several times
before showing the error message. When this error does not occur,
syncfiles is only ran once, so there must be a retry in the code
somewhere. Time is also off on n1 until the setupntp script ran. I've
since reordered to have setupntp run first, however rebooting the node
several times before reordering the script had Syncfiles run properly
while time is still off, so I don't believe this is a time-related issue:
Feb 4 19:27:26 n1 xCAT: Install: syslog setup
Feb 4 19:27:26 n1 xCAT: Performing syncfiles postscript
Feb 4 19:27:26 n1 xCAT: ./syncfiles: the OS name = Linux
Feb 4 13:27:25 evxcat xCAT[20079]: xCAT: Allowing syncfiles from n1
Feb 4 19:27:27 n1 sshd[4392]: Accepted publickey for root from
172.21.0.1 port 41221 ssh2
Feb 4 19:27:27 n1 sshd[4392]: pam_unix(sshd:session): session opened
for user root by (uid=0)
Feb 4 19:27:27 n1 sshd[4392]: Received disconnect from 172.21.0.1:
11: disconnected by user
<snip>
Feb 4 19:27:29 n1 kernel: ADDRCONF(NETDEV_CHANGE): ib0: link becomes
ready
Feb 4 19:27:40 n1 logger[4528]: openibd: Set node_desc for mlx5_0: n1
HCA-1
Feb 4 19:27:40 n1 kernel: ib0: no IPv6 routers present
Feb 4 13:27:43 evxcat xCAT[20181]: xCAT: Allowing syncfiles from n1
Feb 4 19:27:45 n1 sshd[4532]: Accepted publickey for root from
172.21.0.1 port 41233 ssh2
Feb 4 19:27:45 n1 sshd[4532]: pam_unix(sshd:session): session opened
for user root by (uid=0)
Feb 4 19:27:45 n1 sshd[4532]: Received disconnect from 172.21.0.1:
11: disconnected by user
Feb 4 19:27:45 n1 sshd[4532]: pam_unix(sshd:session): session closed
for user root
<snip>
Feb 4 13:28:04 evxcat xCAT[20262]: xCAT: Allowing syncfiles from n1
Feb 4 19:28:06 n1 sshd[4640]: Accepted publickey for root from
172.21.0.1 port 41241 ssh2
Feb 4 19:28:06 n1 sshd[4640]: pam_unix(sshd:session): session opened
for user root by (uid=0)
Feb 4 19:28:06 n1 sshd[4640]: Received disconnect from 172.21.0.1:
11: disconnected by user
Feb 4 19:28:06 n1 sshd[4640]: pam_unix(sshd:session): session closed
for user root
<snip>
Feb 4 13:28:21 evxcat xCAT[20341]: xCAT: Allowing syncfiles from n1
Feb 4 19:28:22 n1 sshd[4748]: Accepted publickey for root from
172.21.0.1 port 41249 ssh2
Feb 4 19:28:22 n1 sshd[4748]: pam_unix(sshd:session): session opened
for user root by (uid=0)
Feb 4 19:28:22 n1 sshd[4748]: Received disconnect from 172.21.0.1:
11: disconnected by user
Feb 4 19:28:22 n1 sshd[4748]: pam_unix(sshd:session): session closed
for user root
<snip>
Feb 4 13:28:41 evxcat xCAT[20420]: xCAT: Allowing syncfiles from n1
Feb 4 19:28:43 n1 sshd[4863]: Accepted publickey for root from
172.21.0.1 port 41257 ssh2
Feb 4 19:28:43 n1 sshd[4863]: pam_unix(sshd:session): session opened
for user root by (uid=0)
Feb 4 19:28:43 n1 sshd[4863]: Received disconnect from 172.21.0.1:
11: disconnected by user
Feb 4 19:28:43 n1 sshd[4863]: pam_unix(sshd:session): session closed
for user root
<snip>
Feb 4 13:29:02 evxcat xCAT[20507]: xCAT: Allowing syncfiles from n1
Feb 4 19:29:04 n1 sshd[4972]: Accepted publickey for root from
172.21.0.1 port 41271 ssh2
Feb 4 19:29:04 n1 sshd[4972]: pam_unix(sshd:session): session opened
for user root by (uid=0)
Feb 4 19:29:04 n1 sshd[4972]: Received disconnect from 172.21.0.1:
11: disconnected by user
Feb 4 19:29:04 n1 sshd[4972]: pam_unix(sshd:session): session closed
for user root
<snip> *
Feb 4 19:29:05 n1 xCAT: ./syncfiles: Perform Syncing File action
encountered error*
Feb 4 19:29:05 n1 xcat: Install: Setup NTP
Feb 4 19:29:06 n1 xcat: ntpdate -t5 172.21.0.1
Feb 4 13:29:06 n1 ntpd[5141]: ntpd [email protected] Sat Nov 23
18:21:48 UTC 2013 (1)
> “Error from pping” can only happen when calling xdcp with -v option.
But I checked the code for updatenode -F and did not see xdcp was
called with a -v option, so I am confused.
Strange, but that's exactly what happened! Here's a direct copy/paste
from my terminal:
[root@evxcat ~]# updatenode n1 -F
Error from pping
File synchronization has completed for nodes.
[root@evxcat ~]# updatenode n1 -F
File synchronization has completed for nodes.
[root@evxcat postscripts]# pping n1
n1: ping
> BTW, the server side error is usually logged in /var/log/messages.
Could you check the time stamps of the messages and see what errors
it had during the deployment time.
There's no other errors shown during deployment, just this one from
syncfiles. Everything also runs fine, including syncfiles when it
shows this error. All of my files are synced including those that are
being appended. The node comes up and is 100% healthy.
Thanks for the help!
On 05.02.2015 08:42, Ling Gao wrote:
BTW, the server side error is usually logged in /var/log/messages.
Could you check the time stamps of the messages and see what errors
it had during the deployment time.
Thanks,
Ling
Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: [email protected], 845-433-5692
"I never worry about the future. It comes soon enough." --- Albert
Einstein
From: Ling Gao/Poughkeepsie/IBM
To: xCAT Users Mailing list <[email protected]>
Date: 02/05/2015 09:34 AM
Subject: Re: [xcat-user] xCAT 2.9 Syncfiles error
------------------------------------------------------------------------
Hi Russell,
Thanks for the debugging.
Is this diskless or diskful deployment? When you say "reboot", did
you mean redeploy?
Could you grep /var/log/messages for "Allowing xdcp" and show me the
output. “Error from pping” can only happen when calling xdcp with
-v option. But I checked the code for updatenode -F and did not see
xdcp was called with a -v option, so I am confused.
Thanks,
Ling
Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: [email protected], 845-433-5692
"I never worry about the future. It comes soon enough." --- Albert
Einstein
From: Russell Jones <[email protected]>
To: [email protected]
Date: 02/04/2015 02:55 PM
Subject: Re: [xcat-user] xCAT 2.9 Syncfiles error
------------------------------------------------------------------------
Some further debugging, I ran "updatenode n1 -F" to try a manual
syncfiles on the node that showed this error during boot and it
completed, but also stated "Error from pping". I ran pping and
updatenode -F again, and I am also unable to reproduce that error from
pping. Further reboots of n1 does not result in any syncfiles errors.
Really difficult to track this down. Any insight would be appreciated!
On 2/4/2015 1:30 PM, Russell Jones wrote:
Hi Ling,
Unfortunately I am running into this issue again, seemingly at random.
Can we revisit this? Are there areas within the syncfiles scripts
where I can get it to tell me *what* the error is that it is
encountering as opposed to just saying "Perform Syncing File action
encountered error" ?
Thanks!
On 1/28/2015 3:34 PM, Ling Gao wrote:
Hi Russell,
Sure, that is a good news. I tried on a RH6.4 node and it worked
fine.
Thanks,
Ling
Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: [email protected]_ <mailto:[email protected]>, 845-433-5692
"I never worry about the future. It comes soon enough." --- Albert
Einstein
From: Russell Jones _<[email protected]>_
<mailto:[email protected]>
To: [email protected]_
<mailto:[email protected]>
Date: 01/28/2015 04:25 PM
Subject: Re: [xcat-user] xCAT 2.9 Syncfiles error
------------------------------------------------------------------------
Hi Ling,
Thanks for looking at this! I was troubleshooting this issue some more
this morning and something, unbeknownst to me, fixed the issue. I have
no idea what changed but it seems to be resolved now....
Sure wish I knew what was breaking it! :-)
On 1/28/2015 2:49 PM, Ling Gao wrote:
Russell,
Can you comment out line 106 "return 1" from
/opt/xcat/lib/perl/xCAT_plugin/syncfiles.pm and try again? Please
restart xcatd.
Thanks,
Ling
Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: [email protected]_ <mailto:[email protected]>, 845-433-5692
"I never worry about the future. It comes soon enough." --- Albert
Einstein
From: Russell Jones _<[email protected]>_
<mailto:[email protected]>
To: _<[email protected]>_
<mailto:[email protected]>
Date: 01/27/2015 05:31 PM
Subject: [xcat-user] xCAT 2.9 Syncfiles error
------------------------------------------------------------------------
xCAT 2.9
Hi all,
I have a centos 6.6 image with the following contents in the synclist
file for the osimage:
/install/custom/v6.6.0-dl-compute-apu/sync/etc/modprobe.d/lustre.conf
-> /etc/modprobe.d/lustre.conf
APPEND:
/install/custom/v6.6.0-dl-compute-apu/sync/etc/fstab -> /etc/fstab
When my compute nodes boot up and execute the script it takes a couple
of minutes, which seems way too long for syncing 2 text files. I get a
"syncfiles exited with code 0" in the xcat.log file on the node, while
in /var/log/messages on the management node I get:
Jan 27 16:07:57 evxcat xCAT[27685]: xCAT: Allowing syncfiles from n0
Jan 27 16:08:09 evxcat xCAT[27753]: xCAT: Allowing syncfiles from n0
Jan 27 16:08:24 evxcat xCAT[27805]: xCAT: Allowing syncfiles from n0
Jan 27 16:08:45 evxcat xCAT[27858]: xCAT: Allowing syncfiles from n0
Jan 27 16:09:00 evxcat xCAT[27909]: xCAT: Allowing syncfiles from n0
Jan 27 16:09:16 evxcat xCAT[27960]: xCAT: Allowing syncfiles from n0
Jan 27 22:09:18 n0 xCAT: ./syncfiles: Perform Syncing File action
encountered error
Looking at the end result the files are synced fine, and my /etc/fstab
is appended properly.
So it seems like the Syncfiles script can't seem to make up its mind
if it exited with an error or not (code 0 usually = good!).
Any ideas what the issue could
be?------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media,
is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now.
_http://goparallel.sourceforge.net/________________________________________________
xCAT-user mailing list_
[email protected]_
<mailto:[email protected]>_
__https://lists.sourceforge.net/lists/listinfo/xcat-user_
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media,
is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. _http://goparallel.sourceforge.net/_
_______________________________________________
xCAT-user mailing list_
[email protected]_
<mailto:[email protected]>_
__https://lists.sourceforge.net/lists/listinfo/xcat-user_
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media,
is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now.
_http://goparallel.sourceforge.net/________________________________________________
xCAT-user mailing list_
[email protected]_
<mailto:[email protected]>_
__https://lists.sourceforge.net/lists/listinfo/xcat-user_
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media,
is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. _http://goparallel.sourceforge.net/_
_______________________________________________
xCAT-user mailing list_
[email protected]_
<mailto:[email protected]>_
__https://lists.sourceforge.net/lists/listinfo/xcat-user_
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media,
is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. _http://goparallel.sourceforge.net/_
_______________________________________________
xCAT-user mailing list_
[email protected]_
<mailto:[email protected]>_
__https://lists.sourceforge.net/lists/listinfo/xcat-user_
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media,
is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now.
_http://goparallel.sourceforge.net/________________________________________________
xCAT-user mailing list
[email protected]_
__https://lists.sourceforge.net/lists/listinfo/xcat-user_
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media,
is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. _http://goparallel.sourceforge.net/_
_______________________________________________
xCAT-user mailing list_
[email protected]_
<mailto:[email protected]>_
__https://lists.sourceforge.net/lists/listinfo/xcat-user_
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media,
is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now.
_http://goparallel.sourceforge.net/________________________________________________
xCAT-user mailing list
[email protected]_
__https://lists.sourceforge.net/lists/listinfo/xcat-user_
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media,
is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. _http://goparallel.sourceforge.net/_
_______________________________________________
xCAT-user mailing list
[email protected]_ <mailto:[email protected]>
_https://lists.sourceforge.net/lists/listinfo/xcat-user_
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media,
is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now.
http://goparallel.sourceforge.net/_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user