We had a node last night go awol and got stuck in permanent 50% CPU wait
time. The node also steadily shot up the load to 400 before we saw it and
had to hard reboot. Besides that all other ganglia metrics flat-lined. Is
this some sort of bizarre kernal problem? We are using xfs with std
settings. I have seen a few postings talk about bizarre problems like this.
Can XFS be blamed or is it more kernal related? Is there a posting somewhere
suggesting the best file system settings? Are there recommended settings for
using CentOS 5.5? We have a 10 nodes cluster we have been pounding for weeks
and we can't seem to keep all ten nodes up for a 24 hour period. I am hoping
there is a lower level problem causing much of it.

Thanks.

Reply via email to