Which jdk are you using?  We've had similar problems with jdk1.6u22 on
Ubuntu 10.04 in Amazon EC2.  Nodes would lock up for 20-40+ minutes.

We haven't done any conclusive tests yet, but we haven't seen the same
problems after down rev'ing to jdk1.6u16.

 -brent

On Mon, Jan 10, 2011 at 12:59 PM, Wayne <wav...@gmail.com> wrote:
> We had a node last night go awol and got stuck in permanent 50% CPU wait
> time. The node also steadily shot up the load to 400 before we saw it and
> had to hard reboot. Besides that all other ganglia metrics flat-lined. Is
> this some sort of bizarre kernal problem? We are using xfs with std
> settings. I have seen a few postings talk about bizarre problems like this.
> Can XFS be blamed or is it more kernal related? Is there a posting somewhere
> suggesting the best file system settings? Are there recommended settings for
> using CentOS 5.5? We have a 10 nodes cluster we have been pounding for weeks
> and we can't seem to keep all ten nodes up for a 24 hour period. I am hoping
> there is a lower level problem causing much of it.
>
> Thanks.
>

Reply via email to