Chad, 
 
A small update on the following issue:
 
 - When we ran 40 instances of your stress application in a 2-controller
setup, we observed an immediate component healthcheck failures
(resulting in node-reboot).  The components that failed the (intranode)
healthcheck were those that had a 1 second timeout. 
 
-  When we increased the minimum (intranode) healthcheck timeouts of
components from 1second to 5seconds, we were able to go upto 80
instances. 
 
However, we haven't managed time to further root-cause the healthcheck
failure.
 
Phani

________________________________

From: Chad Tindel [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 24, 2007 6:10 PM
To: Marella P-G19460
Cc: [email protected]
Subject: Re: [Users] Instable cluster with CPU load


I think it you just ran 2 controllers and ran this on one of the
controllers you should see it.

Thanks,

Chad


On Dec 24, 2007 12:47 AM, Marella P-G19460 < [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]> > wrote:


        Chad, 
         
        What should the topology be - just a controller node or more?
Also where should I run this, controller or payload?
         
        Thanks,
        Phani

________________________________

        
        From: Chad Tindel [mailto:[EMAIL PROTECTED] 
        
        Sent: Monday, December 24, 2007 12:34 PM 

        To: Marella P-G19460
        Cc: [email protected]
        Subject: Re: [Users] Instable cluster with CPU load
        


                I'll see if I can get the report from the HP team sent
to the users list.
                
                [PHANI] Ok, lets check that out. None of my earlier
excuses seem to fly ;-)
                


        Hi Phani-
        
        Here's the small stress program that the HP team said caused
OpenSAF to reboot a node.  They said it doesn't cause a reboot every
single time, so you can try running it multiple times if you don't see
it the first time.  To run it you can do it a few ways: 
        
        
        ./stress_cpu -c 10 -t 600s    or 
        ./stress_cpu -c 12 -t 600s  

        The -c says how many of them to run, so make sure you run well
more of them than you have CPUs in the system.
        
        Can you see if the problem reproduces on your systems? 
        
        Thanks,
        
        Chad
        



______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________
_______________________________________________
Users mailing list
[email protected]
http://list.opensaf.org/maillist/listinfo/users

Reply via email to