I've ran into this problem before, and it was caused by 2 different issues:

* DNS resolution problem
* A secondary NIC on the node was getting an IP address, and taking over communication back to the xcat daemon.

Make sure that neither of those are your issue.



On 5/28/2013 5:55 AM, Qamar Nazir wrote:
Hi List,

I couldn't find the solution for this issue.

I have resolved it by the following the below steps:

- killed the process 'xcatd: install monitor' manually
- Restarted xcatd on the master node.

Before when I was trying to run the command '/usr/bin/awk -f /xcatpost/updateflag.awk <master node IP> 3002' manually it wasn't returning the prompt back. Once I did the above steps it returned the prompt in a second.


Best Regards,

Qamar Nazir

HPC Software Engineer


 

On 12/09/2011 03:27 AM, Jing CDL Sun wrote:
OK, seems not a name resoluion issue. then, maybe you need to follow xiaopeng's suggestion for more debugging.

OR, another straight forward debugging method is to start the xcatd in front, it will show some message about the communication between mn and cn. I used to debug with it, for example:

service xcatd stop
/opt/xcat/sbin/xcatd -f






Best Regards,
-----------------------------
Sun Jing(Ëᄌ)
IBM China Software Development Laboratory
Tel: (86-10) 82453625   E-mail: [email protected]
Address: Building 28, ZhongGuanCun Software Park,
         No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC

±±¾©Êк£µíÇø¶«±±ÍúÎ÷·8ºÅÖйشåÈí¼þÔ°28ºÅÂ¥
Óʱà: 100193



Dave Barry <[email protected]>

2011-12-09 10:49

Please respond to
xCAT Users Mailing list <[email protected]>

To
xCAT Users Mailing list <[email protected]>
cc

Subject
Re: [xcat-user] Installing node hanging at updateflag.awk







Hi Jing,

Yes, the node can resolve cmgmt1, and cmgmt1 can resolve the node. I've checked the resolv.conf on the node's console while it is hung, as well as was able to ping cmgmt1.

That's why this is so strange (and has me pulling my hair out!). If I set the node to install a different profile it installs and runs the postscripts fine, no hang, and reboots without an issue. It's just this specific workstation profile that it is having trouble with, but I don't see anything in the postscripts of this profile that should cause this behavior.



2011/12/8 Jing CDL Sun <[email protected]>
Hi Dave,

Another thing I can think of is, have you check if
cmgmt1 can be resolved on your compute node? Basically you need to set site.nameservers=<mn's ip>, site.domain=<your domain name>, then after makedhcp, the nameserver/domain value will be set in your dhcp server configuration, so after the compute node is installed, the dhcp server will create /etc/resolv.conf on your compute node so that the compute node will know the mn is its name server, and the search path is your domain.





Best Regards,
-----------------------------
Sun Jing(Ëᄌ)
IBM China Software Development Laboratory
Tel: (86-10) 82453625   E-mail:
[email protected]
Address: Building 28, ZhongGuanCun Software Park,
        No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC

±±¾©Êк£µíÇø¶«±±ÍúÎ÷·8ºÅÖйشåÈí¼þÔ°28ºÅÂ¥
Óʱà: 100193



Dave Barry <[email protected]>

2011-12-09 10:31


Please respond to

xCAT Users Mailing list <
[email protected]>

To
xCAT Users Mailing list <[email protected]>
cc

Subject
Re: [xcat-user] Installing node hanging at updateflag.awk









Hi Xiao,

Yes this is a diskfull installation. I will take a look at the syslog tomorrow when I am back in the office to see if I see anything that could be helpful that I may have missed.

Given that the node can install fine when being set to one profile, but not another, what other things outside of DNS and iptables could cause the management node to not be able to receive (or reply) to the flag my node is apparently failing to send?



2011/12/8 Xiao Peng Wang <
[email protected]>
updateflag.awk is used to send a request to xcatd to indicate that installation/netboot has been finished.

The
'updateflag.awk MN 3002' will be called for diskfull installation and 'updateflag.awk $MASTER 3002 "installstatus booted"' should be for the diskless boot. So you case was a diskfull installation, right?

For the debugging, you need to check whether the process 'xcatd: install monitor' has been started on MN, it is used to handle the request from the updateflag.awk.
Also you can try to get some hints from syslog: 1. whether 'nodeset next' command was called? 2. Search the message from node with tag 'xcat'.

You also could try to debug into the do_installm_service in the xcatd. See the code to handle the 'ready', 'next' ...


Thanks
Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (ÍõÏþÅó)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email:
[email protected]
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193

Inactive hide details for Dave Barry ---2011-12-09
            07:35:12---Hi *, Can't seem to figure this issue out. I have
            a node who is rDave Barry ---2011-12-09 07:35:12---Hi *, Can't seem to figure this issue out. I have a node who is running it's

From:
Dave Barry <[email protected]>
To:
xCAT Users Mailing list <[email protected]>
Date:
2011-12-09 07:35
Subject:
[xcat-user] Installing node hanging at updateflag.awk





Hi *,

Can't seem to figure this issue out. I have a node who is running it's postscripts properly (as far as I can tell), but then hangs at updateflag.awk. The specific line that it seems to hang at and is showing in ps xf is:

/bin/awk -f updateflag.awk cmgmt1 3002


That's all that is in the processes line, there is no actual command after the 3002. Even more puzzling is in /tmp/mypostscript.post, the following line does not exist at the very end, while it does on other nodes who installed properly:

updateflag.awk $MASTER 3002 "installstatus booted"


I can resolve both the node and it's master forwards and backwards. This node also installs just fine when I give it a different profile, so there is either something in the OS it is installing (centos 5.4) or one of my postscripts in this profile that is causing the issue, but I don't know how to continue troubleshooting this problem when the issue does not appear to be DNS related. Usually problems like this are caused by DNS.

What would cause mypostscript.post to not have the installstatus line at the bottom? Does this line get written to that file after a certain "something" happens? Any thoughts on logs or something I can look at that would cause this behavior?


Thanks!
------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of
discussion for anyone considering optimizing the pricing and packaging model
of a cloud services business. Read Now!

http://www.accelacomm.com/jaw/sfnl/114/51491232/_______________________________________________
xCAT-user mailing list

[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user


------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of
discussion for anyone considering optimizing the pricing and packaging model
of a cloud services business. Read Now!

http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
xCAT-user mailing list

[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of
discussion for anyone considering optimizing the pricing and packaging model
of a cloud services business. Read Now!

http://www.accelacomm.com/jaw/sfnl/114/51491232/_______________________________________________
xCAT-user mailing list

[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user


------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of
discussion for anyone considering optimizing the pricing and packaging model
of a cloud services business. Read Now!

http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
xCAT-user mailing list

[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of
discussion for anyone considering optimizing the pricing and packaging model
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user




------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may


_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to