Russell,
    Could you send me the output of lsdef n1? Is there a service node 
involve in the cluster?

Thanks,

Ling

Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692 
External: ling...@us.ibm.com, 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein 



From:   Russell Jones <russell-l...@jonesmail.me>
To:     xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Date:   02/05/2015 10:44 AM
Subject:        Re: [xcat-user] xCAT 2.9 Syncfiles error



Hi Ling,
I'm confused too! Here are answers to your questions, thanks for your 
help!
> Is this diskless or diskful deployment? When  you say "reboot", did you 
mean redeploy?
This is a diskless node netbooting a CentOS 6.6 x64 image.
> Could you grep /var/log/messages for "Allowing xdcp" and show me the 
output.
 
That message seems to only show up when doing a packimage. When the node 
is booting it shows "Allowing syncfiles". Here's the output when that is 
happening. Note this is recorded on the management node as I have the 
syslog postscript being ran on the node before syncfiles. Also note that 
it seems to have tried to run Syncfiles several times before showing the 
error message. When this error does not occur, syncfiles is only ran once, 
so there must be a retry in the code somewhere. Time is also off on n1 
until the setupntp script ran. I've since reordered to have setupntp run 
first, however rebooting the node several times before reordering the 
script had Syncfiles run properly while time is still off, so I don't 
believe this is a time-related issue:
Feb  4 19:27:26 n1 xCAT: Install: syslog setup
Feb  4 19:27:26 n1 xCAT: Performing syncfiles postscript
Feb  4 19:27:26 n1 xCAT: ./syncfiles: the OS name = Linux
Feb  4 13:27:25 evxcat xCAT[20079]: xCAT: Allowing syncfiles from n1
Feb  4 19:27:27 n1 sshd[4392]: Accepted publickey for root from 172.21.0.1 
port 41221 ssh2
Feb  4 19:27:27 n1 sshd[4392]: pam_unix(sshd:session): session opened for 
user root by (uid=0)
Feb  4 19:27:27 n1 sshd[4392]: Received disconnect from 172.21.0.1: 11: 
disconnected by user
<snip>
Feb  4 19:27:29 n1 kernel: ADDRCONF(NETDEV_CHANGE): ib0: link becomes 
ready
Feb  4 19:27:40 n1 logger[4528]: openibd: Set node_desc for mlx5_0: n1 
HCA-1
Feb  4 19:27:40 n1 kernel: ib0: no IPv6 routers present
Feb  4 13:27:43 evxcat xCAT[20181]: xCAT: Allowing syncfiles from n1
Feb  4 19:27:45 n1 sshd[4532]: Accepted publickey for root from 172.21.0.1 
port 41233 ssh2
Feb  4 19:27:45 n1 sshd[4532]: pam_unix(sshd:session): session opened for 
user root by (uid=0)
Feb  4 19:27:45 n1 sshd[4532]: Received disconnect from 172.21.0.1: 11: 
disconnected by user
Feb  4 19:27:45 n1 sshd[4532]: pam_unix(sshd:session): session closed for 
user root
<snip>
Feb  4 13:28:04 evxcat xCAT[20262]: xCAT: Allowing syncfiles from n1
Feb  4 19:28:06 n1 sshd[4640]: Accepted publickey for root from 172.21.0.1 
port 41241 ssh2
Feb  4 19:28:06 n1 sshd[4640]: pam_unix(sshd:session): session opened for 
user root by (uid=0)
Feb  4 19:28:06 n1 sshd[4640]: Received disconnect from 172.21.0.1: 11: 
disconnected by user
Feb  4 19:28:06 n1 sshd[4640]: pam_unix(sshd:session): session closed for 
user root
<snip> 
Feb  4 13:28:21 evxcat xCAT[20341]: xCAT: Allowing syncfiles from n1
Feb  4 19:28:22 n1 sshd[4748]: Accepted publickey for root from 172.21.0.1 
port 41249 ssh2
Feb  4 19:28:22 n1 sshd[4748]: pam_unix(sshd:session): session opened for 
user root by (uid=0)
Feb  4 19:28:22 n1 sshd[4748]: Received disconnect from 172.21.0.1: 11: 
disconnected by user
Feb  4 19:28:22 n1 sshd[4748]: pam_unix(sshd:session): session closed for 
user root
<snip> 
Feb  4 13:28:41 evxcat xCAT[20420]: xCAT: Allowing syncfiles from n1
Feb  4 19:28:43 n1 sshd[4863]: Accepted publickey for root from 172.21.0.1 
port 41257 ssh2
Feb  4 19:28:43 n1 sshd[4863]: pam_unix(sshd:session): session opened for 
user root by (uid=0)
Feb  4 19:28:43 n1 sshd[4863]: Received disconnect from 172.21.0.1: 11: 
disconnected by user
Feb  4 19:28:43 n1 sshd[4863]: pam_unix(sshd:session): session closed for 
user root
<snip> 
Feb  4 13:29:02 evxcat xCAT[20507]: xCAT: Allowing syncfiles from n1
Feb  4 19:29:04 n1 sshd[4972]: Accepted publickey for root from 172.21.0.1 
port 41271 ssh2
Feb  4 19:29:04 n1 sshd[4972]: pam_unix(sshd:session): session opened for 
user root by (uid=0)
Feb  4 19:29:04 n1 sshd[4972]: Received disconnect from 172.21.0.1: 11: 
disconnected by user
Feb  4 19:29:04 n1 sshd[4972]: pam_unix(sshd:session): session closed for 
user root
<snip> 
Feb  4 19:29:05 n1 xCAT: ./syncfiles: Perform Syncing File action 
encountered error
Feb  4 19:29:05 n1 xcat: Install: Setup NTP
Feb  4 19:29:06 n1 xcat: ntpdate -t5 172.21.0.1
Feb  4 13:29:06 n1 ntpd[5141]: ntpd 4.2.6p5@1.2349-o Sat Nov 23 18:21:48 
UTC 2013 (1)


> “Error from pping” can only happen when calling xdcp with -v option. But 
I checked the code for updatenode -F and did not see xdcp was called with 
a -v option, so I am confused. 

Strange, but that's exactly what happened! Here's a direct copy/paste from 
my terminal:
[root@evxcat ~]# updatenode n1 -F
Error from pping
File synchronization has completed for nodes.

[root@evxcat ~]# updatenode n1 -F
File synchronization has completed for nodes.

[root@evxcat postscripts]# pping n1
n1: ping

 
> BTW, the server side error is usually logged in /var/log/messages. Could 
you check the time stamps of the messages and see what errors it had 
during the deployment time. 
There's no other errors shown during deployment, just this one from 
syncfiles. Everything also runs fine, including syncfiles when it shows 
this error. All of my files are synced including those that are being 
appended. The node comes up and is 100% healthy.
 
 
Thanks for the help!
 
 
On 05.02.2015 08:42, Ling Gao wrote:
BTW, the server side error is usually logged in /var/log/messages.  Could 
you check the time stamps of the messages and see what errors it had 
during the deployment time. 

Thanks, 

Ling 


Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692 
External: ling...@us.ibm.com, 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein 



From:        Ling Gao/Poughkeepsie/IBM 
To:        xCAT Users Mailing list <xcat-user@lists.sourceforge.net> 
Date:        02/05/2015 09:34 AM 
Subject:        Re: [xcat-user] xCAT 2.9 Syncfiles error 


Hi Russell, 
    Thanks for the debugging. 
    Is this diskless or diskful deployment? When  you say "reboot", did 
you mean redeploy? 
    Could you grep /var/log/messages for "Allowing xdcp" and show me the 
output.    “Error from pping” can only happen when calling xdcp with -v 
option. But I checked the code for updatenode -F and did not see xdcp was 
called with a -v option, so I am confused. 

Thanks, 

Ling 


Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692 
External: ling...@us.ibm.com, 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein 




From:        Russell Jones <russell-l...@jonesmail.me> 
To:        xcat-user@lists.sourceforge.net 
Date:        02/04/2015 02:55 PM 
Subject:        Re: [xcat-user] xCAT 2.9 Syncfiles error 



Some further debugging, I ran "updatenode n1 -F" to try a manual syncfiles 
on the node that showed this error during boot and it completed, but also 
stated "Error from pping". I ran pping and updatenode -F again, and I am 
also unable to reproduce that error from pping. Further reboots of n1 does 
not result in any syncfiles errors.

Really difficult to track this down. Any insight would be appreciated!


On 2/4/2015 1:30 PM, Russell Jones wrote: 
Hi Ling,

Unfortunately I am running into this issue again, seemingly at random. Can 
we revisit this? Are there areas within the syncfiles scripts where I can 
get it to tell me *what* the error is that it is encountering as opposed 
to just saying "Perform Syncing File action encountered error" ?

Thanks!

On 1/28/2015 3:34 PM, Ling Gao wrote: 
Hi Russell, 
     Sure, that is a good news.  I tried on a RH6.4 node and it worked 
fine. 

Thanks, 

Ling

Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692 
External: ling...@us.ibm.com, 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein 



From:        Russell Jones <russell-l...@jonesmail.me> 
To:        xcat-user@lists.sourceforge.net 
Date:        01/28/2015 04:25 PM 
Subject:        Re: [xcat-user] xCAT 2.9 Syncfiles error 



Hi Ling,

Thanks for looking at this! I was troubleshooting this issue some more 
this morning and something, unbeknownst to me, fixed the issue. I have no 
idea what changed but it seems to be resolved now....

Sure wish I knew what was breaking it! :-) 


On 1/28/2015 2:49 PM, Ling Gao wrote: 
Russell, 
     Can you comment out line 106  "return 1"  from 
/opt/xcat/lib/perl/xCAT_plugin/syncfiles.pm and try again?  Please restart 
xcatd. 

Thanks, 

Ling 


Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692 
External: ling...@us.ibm.com, 845-433-5692

"I never worry about the future. It comes soon enough." --- Albert 
Einstein 



From:        Russell Jones <russell-l...@jonesmail.me> 
To:        <xcat-user@lists.sourceforge.net> 
Date:        01/27/2015 05:31 PM 
Subject:        [xcat-user] xCAT 2.9 Syncfiles error 



xCAT 2.9 
Hi all, 
I have a centos 6.6 image with the following contents in the synclist file 
for the osimage: 
/install/custom/v6.6.0-dl-compute-apu/sync/etc/modprobe.d/lustre.conf -> 
/etc/modprobe.d/lustre.conf 
APPEND: 
/install/custom/v6.6.0-dl-compute-apu/sync/etc/fstab -> /etc/fstab 
 
When my compute nodes boot up and execute the script it takes a couple of 
minutes, which seems way too long for syncing 2 text files. I get a 
"syncfiles exited with code 0" in the xcat.log file on the node, while in 
/var/log/messages on the management node I get: 
 
Jan 27 16:07:57 evxcat xCAT[27685]: xCAT: Allowing syncfiles from n0 
Jan 27 16:08:09 evxcat xCAT[27753]: xCAT: Allowing syncfiles from n0 
Jan 27 16:08:24 evxcat xCAT[27805]: xCAT: Allowing syncfiles from n0 
Jan 27 16:08:45 evxcat xCAT[27858]: xCAT: Allowing syncfiles from n0 
Jan 27 16:09:00 evxcat xCAT[27909]: xCAT: Allowing syncfiles from n0 
Jan 27 16:09:16 evxcat xCAT[27960]: xCAT: Allowing syncfiles from n0 
Jan 27 22:09:18 n0 xCAT: ./syncfiles: Perform Syncing File action 
encountered error 
 
Looking at the end result the files are synced fine, and my /etc/fstab is 
appended properly. 
So it seems like the Syncfiles script can't seem to make up its mind if it 
exited with an error or not (code 0 usually = good!). 
Any ideas what the issue could be?
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is 
your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user 

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is 
your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/ 


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is 
your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is 
your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/ 


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user




------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is 
your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/ 


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is 
your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is 
your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

 
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is 
your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to