Could you try to run the updatenodestat on your service node which is very slow to response the updateflag.awk?
Steps: cd /opt/xcat/bin ln -s ../bin/xcatclient updatenodestat time updatenodestat <node> booted To see how long it needs to finish it on service node. Thanks Best Regards ---------------------------------------------------------------------- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 From: Russell Jones <russell-l...@jonesmail.me> To: xcat-user@lists.sourceforge.net, Date: 2014/03/13 22:40 Subject: Re: [xcat-user] Additional performance issues xCat Service node specs: CPU - 2 x Dual-Core AMD Opteron(tm) Processor 2212 Memory - 6 gigs (2 x 1gig and 2 x 512mb per CPU) Diskful - Shared /install, shared /tftpboot Useflowcontrol is disabled. xCat master was 2.8.2 then upgraded to 2.8.3 xCat DB contains 7851 nodes in nodelist table. I also did a few timing tests with running updateflag.awk from a compute node to each servicenode and master. One service node take 3 times longer than the other 2 service nodes. Using the xcat Master (cmx04-hc) is 3 times faster than using any service node. [root@c103n69 xcatpost]# time ./updateflag.awk servicefarm03-hc 3002 "installstatus booted" real 0m10.175s user 0m0.000s sys 0m0.002s [root@c103n69 xcatpost]# time ./updateflag.awk servicefarm02-hc 3002 "installstatus booted" real 0m3.679s user 0m0.001s sys 0m0.001s [root@c103n69 xcatpost]# time ./updateflag.awk servicefarm01-hc 3002 "installstatus booted" real 0m3.653s user 0m0.002s sys 0m0.000s [root@c103n69 xcatpost]# time ./updateflag.awk master 3002 "installstatus booted" real 0m0.491s user 0m0.000s sys 0m0.002s All of the test were done with the service and master nodes were idle. For all the service nodes the CPU usage for the Install Monitor process jumped up to 99% for the entire duration. This did not seem to happen on the master node (but then again it was done in under a second) There are also instances where the Install Monitor process totally stops responding on a servicenode. This caused the updateflag.awk command to hang indefinitely on the compute nodes. Issuing a “service xcatd reload” appears to restart the Install Monitor and it starts to respond again.. However at this point the updateflag.awk process had to be killed on the compute nodes which were hung. I have looked into enabling the useflowcontrol option since the Docs say it is enabled by default for new 2.8.3 installs. (Ours was upgraded from 2.8.2, thus was not enabled) No errors are present in any logs that are pointing to an issue. Once useflowcontrol is enabled in the site table, will xcatd have to be restarted/reloaded on the master and service nodes? On 3/11/2014 7:32 AM, Lissa Valletta wrote: This is happening with only 50 nodes? What size memory and how many CPU's do you have on the service nodes. Are the service nodes diskfull installed. What is the setting for site attribute useflowcontrol, if it is set at all. We have many systems installing a lot more nodes than 50 without this problem What is the OS and xCAT level on the servicenode and Management Node. Can you monitor /var/log/messages on the Management Node during this to see if you are seeing errors from xcatd on the service node. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 Inactive hide details for Russell Jones ---03/10/2014 05:25:55 PM---Hi Wang, As a followup to this, is there any additional perRussell Jones ---03/10/2014 05:25:55 PM---Hi Wang, As a followup to this, is there any additional performance tips or From: Russell Jones <russell-l...@jonesmail.me> To: xcat-user@lists.sourceforge.net, Date: 03/10/2014 05:25 PM Subject: Re: [xcat-user] Additional performance issues Hi Wang, As a followup to this, is there any additional performance tips or things we can do to assist with this performance issue? I don't think that the problem we are seeing is caused by the "getpostscripts" request, as it doesn't seem like getpostscript.awk connects to port 3002, the port that the "Install Monitor" uses. As a reminder, the problem we are seeing is the "Install Monitor" process on the service node being pegged at 100% CPU, and nodes hanging before showing their login console, *after* the postscripts have already finished. When nodes finally start showing their login console the "install monitor" service starts lowering its CPU usage. After disabling the site.nodestatus feature we seem to have resolved the performance issue. However we would like to use this feature if we can, as long as it doesn't cause such a huge drop in service node performance. Thanks! On 3/2/2014 8:59 PM, Xiao Peng Wang wrote: I cannot see any impact except the status of node won't be update when setting 'site.nodestatus=n', you can try to leverage it to solve your issue. But I am thinking maybe your issue caused by the running of 'getpostscripts' when booting the diskless nodes. You can try to set the site.precreatemypostscripts=1 to improve the performance of getmypostscript operation. I suggest you to try them one by one and then both to see the result. Thanks Best Regards ---------------------------------------------------------------------- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 Inactive hide details for Russell Jones ---2014/03/02 13:18:43---Hi all, We are seeing pretty consistently that when 50+ diskleRussell Jones ---2014/03/02 13:18:43---Hi all, We are seeing pretty consistently that when 50+ diskless nodes are From: Russell Jones <russell-l...@jonesmail.me> To: xcat-user@lists.sourceforge.net, Date: 2014/03/02 13:18 Subject: [xcat-user] Additional performance issues Hi all, We are seeing pretty consistently that when 50+ diskless nodes are booted against the same single service node, before showing their login console they all hang for around 5-10 minutes after postscripts run while the service node is chugging away at using a constant 100% of a single core. The process on the service node that is at fault is the "install monitor". This seems to be tied to the site.nodestatus option. Other than disabling this option (and fixing the diskfull reinstall loop bug with it that I reported earlier), is there a way of improving the responsiveness of a service node when it's updating the node's status? Would we lose anything from disabling this option besides the "status" column not being updated? Thanks! ------------------------------------------------------------------------------ Flow-based real-time traffic analytics software. Cisco certified tool. Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer Customize your own dashboards, set traffic alerts and generate reports. Network behavioral analysis & security monitoring. All-in-one tool. http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------------------------------------------------------ Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
<<inline: graycol.gif>>
------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user