Peter - my apologies for the slow response - we had
 to divert down a 'Plan B' approach last week involving
 MySQL, memcache, redis and various other uglies.

On 20 September 2010 23:11, Peter Schuller <peter.schul...@infidyne.com> wrote:
> Are you running an old JVM by any chance? (Just grasping for straws.)

 JVM is Sun's 1.6 - I've been caught out once before with
 openjdk's performance challenges, so I'm particularly
 careful with this now.

> Hmm. I can see useless spinning decreasing efficiency, but the numbers
> from your log are really extreme. Do you have a URL / bug id or
> anything that one can read up on about this?

 We've rebuilt the Cassandra cluster this week, avoiding
 Hadoop entirely - partly to reduce the variables in play,
 and partly because it looks like we'll only need two 'feeder'
 nodes for our jobs with the size of Cassandra cluster that
 we're likely going to end up with (10-12 ish).  Any ratio
 higher than that seems to, on EC2 at least, cause too many
 fails on the Cassandra side.

 Actually, there's a question - is it 'acceptable' do you think
 for GC to take out a small number of your nodes at a time,
 so long as the bulk (or at least where RF is > nodes gone
 on STW GC) of the nodes are okay?  I suspect this is a
 question peculiar to Amazon EC2, as I've never seen a box
 rendered non-communicative by a single core flat-lining.

 By the end of this week we hope to have a better idea (mind,
 I've thought that for the past 5 weeks of experimenting).  If I'm
 back to square one at that point I'll start pastebining some logs
 and configs.  Increasingly, I'm convinced that many of these
 problems would be solved if we hosted our own servers.

 cheers,
 Jedd.

Reply via email to