Hi Ling, 

I'm confused too! Here are answers to your questions,
thanks for your help! 

> Is this diskless or diskful deployment? When
you say "reboot", did you mean redeploy? 

This is a diskless node
netbooting a CentOS 6.6 x64 image.

> Could you grep /var/log/messages
for "Allowing xdcp" and show me the output. 

That message seems to only
show up when doing a packimage. When the node is booting it shows
"Allowing syncfiles". Here's the output when that is happening. Note
this is recorded on the management node as I have the syslog postscript
being ran on the node before syncfiles. Also note that it seems to have
tried to run Syncfiles several times before showing the error message.
When this error does not occur, syncfiles is only ran once, so there
must be a retry in the code somewhere. Time is also off on n1 until the
setupntp script ran. I've since reordered to have setupntp run first,
however rebooting the node several times before reordering the script
had Syncfiles run properly while time is still off, so I don't believe
this is a time-related issue:

Feb 4 19:27:26 n1 xCAT: Install: syslog
setup
Feb 4 19:27:26 n1 xCAT: Performing syncfiles postscript
Feb 4
19:27:26 n1 xCAT: ./syncfiles: the OS name = Linux
Feb 4 13:27:25 evxcat
xCAT[20079]: xCAT: Allowing syncfiles from n1
Feb 4 19:27:27 n1
sshd[4392]: Accepted publickey for root from 172.21.0.1 port 41221
ssh2
Feb 4 19:27:27 n1 sshd[4392]: pam_unix(sshd:session): session
opened for user root by (uid=0)
Feb 4 19:27:27 n1 sshd[4392]: Received
disconnect from 172.21.0.1: 11: disconnected by user
<snip>
Feb 4
19:27:29 n1 kernel: ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
Feb
4 19:27:40 n1 logger[4528]: openibd: Set node_desc for mlx5_0: n1
HCA-1
Feb 4 19:27:40 n1 kernel: ib0: no IPv6 routers present
Feb 4
13:27:43 evxcat xCAT[20181]: xCAT: Allowing syncfiles from n1
Feb 4
19:27:45 n1 sshd[4532]: Accepted publickey for root from 172.21.0.1 port
41233 ssh2
Feb 4 19:27:45 n1 sshd[4532]: pam_unix(sshd:session): session
opened for user root by (uid=0)
Feb 4 19:27:45 n1 sshd[4532]: Received
disconnect from 172.21.0.1: 11: disconnected by user
Feb 4 19:27:45 n1
sshd[4532]: pam_unix(sshd:session): session closed for user
root
<snip>
Feb 4 13:28:04 evxcat xCAT[20262]: xCAT: Allowing syncfiles
from n1
Feb 4 19:28:06 n1 sshd[4640]: Accepted publickey for root from
172.21.0.1 port 41241 ssh2
Feb 4 19:28:06 n1 sshd[4640]:
pam_unix(sshd:session): session opened for user root by (uid=0)
Feb 4
19:28:06 n1 sshd[4640]: Received disconnect from 172.21.0.1: 11:
disconnected by user
Feb 4 19:28:06 n1 sshd[4640]:
pam_unix(sshd:session): session closed for user root
<snip> 
Feb 4
13:28:21 evxcat xCAT[20341]: xCAT: Allowing syncfiles from n1
Feb 4
19:28:22 n1 sshd[4748]: Accepted publickey for root from 172.21.0.1 port
41249 ssh2
Feb 4 19:28:22 n1 sshd[4748]: pam_unix(sshd:session): session
opened for user root by (uid=0)
Feb 4 19:28:22 n1 sshd[4748]: Received
disconnect from 172.21.0.1: 11: disconnected by user
Feb 4 19:28:22 n1
sshd[4748]: pam_unix(sshd:session): session closed for user root
<snip>

Feb 4 13:28:41 evxcat xCAT[20420]: xCAT: Allowing syncfiles from n1
Feb
4 19:28:43 n1 sshd[4863]: Accepted publickey for root from 172.21.0.1
port 41257 ssh2
Feb 4 19:28:43 n1 sshd[4863]: pam_unix(sshd:session):
session opened for user root by (uid=0)
Feb 4 19:28:43 n1 sshd[4863]:
Received disconnect from 172.21.0.1: 11: disconnected by user
Feb 4
19:28:43 n1 sshd[4863]: pam_unix(sshd:session): session closed for user
root
<snip> 
Feb 4 13:29:02 evxcat xCAT[20507]: xCAT: Allowing syncfiles
from n1
Feb 4 19:29:04 n1 sshd[4972]: Accepted publickey for root from
172.21.0.1 port 41271 ssh2
Feb 4 19:29:04 n1 sshd[4972]:
pam_unix(sshd:session): session opened for user root by (uid=0)
Feb 4
19:29:04 n1 sshd[4972]: Received disconnect from 172.21.0.1: 11:
disconnected by user
Feb 4 19:29:04 n1 sshd[4972]:
pam_unix(sshd:session): session closed for user root
<snip> 
FEBĀ  4
19:29:05 N1 XCAT: ./SYNCFILES: PERFORM SYNCING FILE ACTION ENCOUNTERED
ERROR
Feb 4 19:29:05 n1 xcat: Install: Setup NTP
Feb 4 19:29:06 n1 xcat:
ntpdate -t5 172.21.0.1
Feb 4 13:29:06 n1 ntpd[5141]: ntpd
4.2.6p5@1.2349-o Sat Nov 23 18:21:48 UTC 2013 (1)

> "Error from pping"
can only happen when calling xdcp with -v option. But I checked the code
for updatenode -F and did not see xdcp was called with a -v option, so I
am confused. 

Strange, but that's exactly what happened! Here's a
direct copy/paste from my terminal:

[root@evxcat ~]# updatenode n1
-F
Error from pping
File synchronization has completed for
nodes.

[root@evxcat ~]# updatenode n1 -F
File synchronization has
completed for nodes.

[root@evxcat postscripts]# pping n1
n1: ping

>
BTW, the server side error is usually logged in /var/log/messages. Could
you check the time stamps of the messages and see what errors it had
during the deployment time. 

There's no other errors shown during
deployment, just this one from syncfiles. Everything also runs fine,
including syncfiles when it shows this error. All of my files are synced
including those that are being appended. The node comes up and is 100%
healthy.

Thanks for the help! 

On 05.02.2015 08:42, Ling Gao wrote:


> BTW, the server side error is usually logged in /var/log/messages.
Could you check the time stamps of the messages and see what errors it
had during the deployment time. 
> 
> Thanks, 
> 
> Ling 
> 
> Ling
Gao
> Poughkeepsie Unix Development Lab
> IBM Systems and Technology
Group
> Internal: T/L 293-5692 
> External: ling...@us.ibm.com,
845-433-5692
> 
> "I never worry about the future. It comes soon
enough." --- Albert Einstein 
> 
> From: Ling Gao/Poughkeepsie/IBM 
>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> 
> Date:
02/05/2015 09:34 AM 
> Subject: Re: [xcat-user] xCAT 2.9 Syncfiles error

> 
> -------------------------
> 
> Hi Russell, 
> Thanks for the
debugging. 
> Is this diskless or diskful deployment? When you say
"reboot", did you mean redeploy? 
> Could you grep /var/log/messages for
"Allowing xdcp" and show me the output. "Error from pping" can only
happen when calling xdcp with -v option. But I checked the code for
updatenode -F and did not see xdcp was called with a -v option, so I am
confused. 
> 
> Thanks, 
> 
> Ling 
> 
> Ling Gao
> Poughkeepsie Unix
Development Lab
> IBM Systems and Technology Group
> Internal: T/L
293-5692 
> External: ling...@us.ibm.com, 845-433-5692
> 
> "I never
worry about the future. It comes soon enough." --- Albert Einstein 
> 
>
From: Russell Jones <russell-l...@jonesmail.me> 
> To:
xcat-user@lists.sourceforge.net 
> Date: 02/04/2015 02:55 PM 
> Subject:
Re: [xcat-user] xCAT 2.9 Syncfiles error 
> 
>
-------------------------
> 
> Some further debugging, I ran "updatenode
n1 -F" to try a manual syncfiles on the node that showed this error
during boot and it completed, but also stated "Error from pping". I ran
pping and updatenode -F again, and I am also unable to reproduce that
error from pping. Further reboots of n1 does not result in any syncfiles
errors.
> 
> Really difficult to track this down. Any insight would be
appreciated!
> 
> On 2/4/2015 1:30 PM, Russell Jones wrote: 
> Hi
Ling,
> 
> Unfortunately I am running into this issue again, seemingly
at random. Can we revisit this? Are there areas within the syncfiles
scripts where I can get it to tell me *what* the error is that it is
encountering as opposed to just saying "Perform Syncing File action
encountered error" ?
> 
> Thanks!
> 
> On 1/28/2015 3:34 PM, Ling Gao
wrote: 
> Hi Russell, 
> Sure, that is a good news. I tried on a RH6.4
node and it worked fine. 
> 
> Thanks, 
> 
> Ling
> 
> Ling Gao
>
Poughkeepsie Unix Development Lab
> IBM Systems and Technology Group
>
Internal: T/L 293-5692 
> External: ling...@us.ibm.com, 845-433-5692
>

> "I never worry about the future. It comes soon enough." --- Albert
Einstein 
> 
> From: Russell Jones <russell-l...@jonesmail.me> 
> To:
xcat-user@lists.sourceforge.net 
> Date: 01/28/2015 04:25 PM 
> Subject:
Re: [xcat-user] xCAT 2.9 Syncfiles error 
> 
>
-------------------------
> 
> Hi Ling,
> 
> Thanks for looking at this!
I was troubleshooting this issue some more this morning and something,
unbeknownst to me, fixed the issue. I have no idea what changed but it
seems to be resolved now....
> 
> Sure wish I knew what was breaking it!
:-) 
> 
> On 1/28/2015 2:49 PM, Ling Gao wrote: 
> Russell, 
> Can you
comment out line 106 "return 1" from
/opt/xcat/lib/perl/xCAT_plugin/syncfiles.pm and try again? Please
restart xcatd. 
> 
> Thanks, 
> 
> Ling 
> 
> Ling Gao
> Poughkeepsie
Unix Development Lab
> IBM Systems and Technology Group
> Internal: T/L
293-5692 
> External: ling...@us.ibm.com, 845-433-5692
> 
> "I never
worry about the future. It comes soon enough." --- Albert Einstein 
> 
>
From: Russell Jones <russell-l...@jonesmail.me> 
> To:
<xcat-user@lists.sourceforge.net> 
> Date: 01/27/2015 05:31 PM 
>
Subject: [xcat-user] xCAT 2.9 Syncfiles error 
> 
>
-------------------------
> 
> xCAT 2.9 
> 
> Hi all, 
> 
> I have a
centos 6.6 image with the following contents in the synclist file for
the osimage: 
> 
>
/install/custom/v6.6.0-dl-compute-apu/sync/etc/modprobe.d/lustre.conf ->
/etc/modprobe.d/lustre.conf 
> APPEND: 
>
/install/custom/v6.6.0-dl-compute-apu/sync/etc/fstab -> /etc/fstab 
> 
>
When my compute nodes boot up and execute the script it takes a couple
of minutes, which seems way too long for syncing 2 text files. I get a
"syncfiles exited with code 0" in the xcat.log file on the node, while
in /var/log/messages on the management node I get: 
> 
> Jan 27 16:07:57
evxcat xCAT[27685]: xCAT: Allowing syncfiles from n0 
> Jan 27 16:08:09
evxcat xCAT[27753]: xCAT: Allowing syncfiles from n0 
> Jan 27 16:08:24
evxcat xCAT[27805]: xCAT: Allowing syncfiles from n0 
> Jan 27 16:08:45
evxcat xCAT[27858]: xCAT: Allowing syncfiles from n0 
> Jan 27 16:09:00
evxcat xCAT[27909]: xCAT: Allowing syncfiles from n0 
> Jan 27 16:09:16
evxcat xCAT[27960]: xCAT: Allowing syncfiles from n0 
> Jan 27 22:09:18
n0 xCAT: ./syncfiles: Perform Syncing File action encountered error 
>

> Looking at the end result the files are synced fine, and my
/etc/fstab is appended properly. 
> 
> So it seems like the Syncfiles
script can't seem to make up its mind if it exited with an error or not
(code 0 usually = good!). 
> 
> Any ideas what the issue could
be?------------------------------------------------------------------------------
>
Dive into the World of Parallel Programming. The Go Parallel Website,
>
sponsored by Intel and developed in partnership with Slashdot Media, is
your
> hub for all things parallel software development, from weekly
thought
> leadership blogs to news, videos, case studies, tutorials and
more. Take a
> look and join the conversation now.
http://goparallel.sourceforge.net/
[1]_______________________________________________
> xCAT-user mailing
list
> xCAT-user@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/xcat-user [2] 
> 
>
------------------------------------------------------------------------------
>
Dive into the World of Parallel Programming. The Go Parallel Website,
>
sponsored by Intel and developed in partnership with Slashdot Media, is
your
> hub for all things parallel software development, from weekly
thought
> leadership blogs to news, videos, case studies, tutorials and
more. Take a
> look and join the conversation now.
http://goparallel.sourceforge.net/ [1] 
> 
>
_______________________________________________
> xCAT-user mailing
list
> xCAT-user@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/xcat-user [2]
> 
>
------------------------------------------------------------------------------
>
Dive into the World of Parallel Programming. The Go Parallel Website,
>
sponsored by Intel and developed in partnership with Slashdot Media, is
your
> hub for all things parallel software development, from weekly
thought
> leadership blogs to news, videos, case studies, tutorials and
more. Take a
> look and join the conversation now.
http://goparallel.sourceforge.net/
[1]_______________________________________________
> xCAT-user mailing
list
> xCAT-user@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/xcat-user [2]
> 
>
------------------------------------------------------------------------------
>
Dive into the World of Parallel Programming. The Go Parallel Website,
>
sponsored by Intel and developed in partnership with Slashdot Media, is
your
> hub for all things parallel software development, from weekly
thought
> leadership blogs to news, videos, case studies, tutorials and
more. Take a
> look and join the conversation now.
http://goparallel.sourceforge.net/ [1] 
> 
>
_______________________________________________
> xCAT-user mailing
list
> xCAT-user@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/xcat-user [2]
> 
>
------------------------------------------------------------------------------
>
Dive into the World of Parallel Programming. The Go Parallel Website,
>
sponsored by Intel and developed in partnership with Slashdot Media, is
your
> hub for all things parallel software development, from weekly
thought
> leadership blogs to news, videos, case studies, tutorials and
more. Take a
> look and join the conversation now.
http://goparallel.sourceforge.net/ [1] 
> 
>
_______________________________________________
> xCAT-user mailing
list
> xCAT-user@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/xcat-user [2]
> 
>
------------------------------------------------------------------------------
>
Dive into the World of Parallel Programming. The Go Parallel Website,
>
sponsored by Intel and developed in partnership with Slashdot Media, is
your
> hub for all things parallel software development, from weekly
thought
> leadership blogs to news, videos, case studies, tutorials and
more. Take a
> look and join the conversation now.
http://goparallel.sourceforge.net/
[1]_______________________________________________
> xCAT-user mailing
list
> xCAT-user@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/xcat-user [2]
> 
>
------------------------------------------------------------------------------
>
Dive into the World of Parallel Programming. The Go Parallel Website,
>
sponsored by Intel and developed in partnership with Slashdot Media, is
your
> hub for all things parallel software development, from weekly
thought
> leadership blogs to news, videos, case studies, tutorials and
more. Take a
> look and join the conversation now.
http://goparallel.sourceforge.net/ [1]
> 
>
_______________________________________________
> xCAT-user mailing
list
> xCAT-user@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/xcat-user [2]




Links:
------
[1] http://goparallel.sourceforge.net/
[2]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to