xCat Service node specs:

CPU - 2 x Dual-Core AMD Opteron(tm) Processor 2212
Memory - 6 gigs  (2 x 1gig and 2 x 512mb per CPU)
Diskful - Shared /install, shared /tftpboot
Useflowcontrol is disabled.  xCat master was 2.8.2 then upgraded to 2.8.3
xCat DB contains 7851 nodes in nodelist table.
 

I also did a few timing tests with running updateflag.awk from a compute node to each servicenode and master.

One service node take 3 times longer than the other 2 service nodes.

Using the xcat Master (cmx04-hc) is 3 times faster than using any service node.

[root@c103n69 xcatpost]# time ./updateflag.awk servicefarm03-hc 3002 "installstatus booted"

real    0m10.175s
user    0m0.000s
sys     0m0.002s

[root@c103n69 xcatpost]# time ./updateflag.awk servicefarm02-hc 3002 "installstatus booted"
 
real    0m3.679s
user    0m0.001s
sys     0m0.001s

[root@c103n69 xcatpost]# time ./updateflag.awk servicefarm01-hc 3002 "installstatus booted"
 
real    0m3.653s
user    0m0.002s
sys     0m0.000s

[root@c103n69 xcatpost]# time ./updateflag.awk master 3002 "installstatus booted"
 
real    0m0.491s
user    0m0.000s
sys     0m0.002s

 

All of the test were done with the service and master nodes were idle.  For all the service nodes the CPU usage for the Install Monitor process jumped up to 99% for the entire duration.

This did not seem to happen on the master node (but then again it was done in under a second)

There are also instances where the Install Monitor process totally stops responding on a servicenode.  This caused the updateflag.awk command to hang indefinitely on the compute nodes.

Issuing a “service xcatd reload” appears to restart the Install Monitor and it starts to respond again.. However at this point the updateflag.awk process had to be killed on the compute nodes which were hung.
 
I have looked into enabling the useflowcontrol option since the Docs say it is enabled by default for new 2.8.3 installs. (Ours was upgraded from 2.8.2, thus was not enabled)

No errors are present in any logs that are pointing to an issue.

Once useflowcontrol is enabled in the site table, will xcatd have to be restarted/reloaded on the master and service nodes?



On 3/11/2014 7:32 AM, Lissa Valletta wrote:

This is happening with only 50 nodes?    What size memory and how many CPU's do you have on the service nodes.  Are the service nodes diskfull installed. What is the setting for site attribute useflowcontrol,  if it is set at all.    We have many systems installing a  lot more nodes than 50 without this problem
What is the OS and xCAT level on the servicenode and Management Node.
Can you monitor /var/log/messages on the Management Node during this to see if you are seeing errors from xcatd on the service node.


Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102



Inactive hide details for Russell Jones ---03/10/2014
          05:25:55 PM---Hi Wang, As a followup to this, is there any
          additional perRussell Jones ---03/10/2014 05:25:55 PM---Hi Wang, As a followup to this, is there any additional performance tips or

From: Russell Jones <russell-l...@jonesmail.me>
To: xcat-user@lists.sourceforge.net,
Date: 03/10/2014 05:25 PM
Subject: Re: [xcat-user] Additional performance issues





Hi Wang,

As a followup to this, is there any additional performance tips or things we can do to assist with this performance issue? I don't think that the problem we are seeing is caused by the "getpostscripts" request, as it doesn't seem like getpostscript.awk connects to port 3002, the port that the "Install Monitor" uses. As a reminder, the problem we are seeing is the "Install Monitor" process on the service node being pegged at 100% CPU, and nodes hanging before showing their login console, *after* the postscripts have already finished. When nodes finally start showing their login console the "install monitor" service starts lowering its CPU usage.

After disabling the site.nodestatus feature we seem to have resolved the performance issue. However we would like to use this feature if we can, as long as it doesn't cause such a huge drop in service node performance.

Thanks!


On 3/2/2014 8:59 PM, Xiao Peng Wang wrote:

    I cannot see any impact except the status of node won't be update when setting 'site.nodestatus=n', you can try to leverage it to solve your issue.

    But I am thinking maybe your issue caused by the running of 'getpostscripts' when booting the diskless nodes. You can try to set the site.precreatemypostscripts=1 to improve the performance of getmypostscript operation.


    I suggest you to try them one by one and then both to see the result.



    Thanks
    Best Regards
    ----------------------------------------------------------------------
    Wang Xiaopeng (王晓朋)
    IBM China System Technology Laboratory
    Tel: 86-10-82453455
    Email:
    w...@cn.ibm.com
    Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193


    Inactive hide details for Russell Jones ---2014/03/02
          13:18:43---Hi all, We are seeing pretty consistently that when
          50+ diskleRussell Jones ---2014/03/02 13:18:43---Hi all, We are seeing pretty consistently that when 50+ diskless nodes are

    From:
    Russell Jones <russell-l...@jonesmail.me>
    To:
    xcat-user@lists.sourceforge.net,
    Date:
    2014/03/02 13:18
    Subject:
    [xcat-user] Additional performance issues



    Hi all,

    We are seeing pretty consistently that when 50+ diskless nodes are
    booted against the same single service node, before showing their login
    console they all hang for around 5-10 minutes after postscripts run
    while the service node is chugging away at using a constant 100% of a
    single core. The process on the service node that is at fault is the
    "install monitor".

    This seems to be tied to the site.nodestatus option. Other than
    disabling this option (and fixing the diskfull reinstall loop bug with
    it that I reported earlier), is there a way of improving the
    responsiveness of a service node when it's updating the node's status?
    Would we lose anything from disabling this option besides the "status"
    column not being updated?


    Thanks!

    ------------------------------------------------------------------------------
    Flow-based real-time traffic analytics software. Cisco certified tool.
    Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
    Customize your own dashboards, set traffic alerts and generate reports.
    Network behavioral analysis & security monitoring. All-in-one tool.

    http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
    _______________________________________________
    xCAT-user mailing list

    xCAT-user@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/xcat-user




    ------------------------------------------------------------------------------
    Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
    With Perforce, you get hassle-free workflows. Merge that actually works.
    Faster operations. Version large binaries.  Built-in WAN optimization and the
    freedom to use Git, Perforce or both. Make the move to Perforce.
    http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk


    _______________________________________________
    xCAT-user mailing list
    xCAT-user@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to