Re: TimeOutExceptions and Cluster Performance

2010-02-12 Thread Joel Meyer
It might be an EC2 issue. We've been battling them all day and we noticed Cassandra performance started to tank sometime earlier this week. I'm not convinced the problem is with Cassandra, as we were seeing sub-100ms response times prior to this and we haven't made any changes. We're also running o

Re: TimeOutExceptions and Cluster Performance

2010-02-12 Thread Jonathan Ellis
There's a lot more details that would be useful, but if you are on the verge of OOMing and something actually running out, then that's probably the culprit; when the JVM gets low on ram it will consume all your CPU trying to GC enough to continue. (you mentioned seeing high cpu on one core which t

Re: OOM on restart

2010-02-12 Thread Jonathan Ellis
On Fri, Feb 12, 2010 at 9:26 PM, Jonathan Ellis wrote: >> And what is the reasoning behind lowering object count?  Isn't it >> whichever hits first which causes flushing?  Or is there other >> memory used when object count is higher? > > http://wiki.apache.org/cassandra/MemtableThresholds These a

Re: OOM on restart

2010-02-12 Thread Jonathan Ellis
On Fri, Feb 12, 2010 at 6:08 PM, Anthony Molinaro wrote: > Also, is it okay to set MemtableSizeInMB in lower and restart? Yes. > And what is the reasoning behind lowering object count?  Isn't it > whichever hits first which causes flushing?  Or is there other > memory used when object count is h

Re: OOM on restart

2010-02-12 Thread Jonathan Ellis
right, and remembering that MemtableSizeInMB is just the size of the values, you can estimate around 50% overhead on top of that. On Fri, Feb 12, 2010 at 6:04 PM, Anthony Molinaro wrote: > Okay, is it better written as > > NumMemtables = 1 + 2 * AvailableProcessors + NumDataFileDirectory > > Thu

TimeOutExceptions and Cluster Performance

2010-02-12 Thread Stephen Hamer
Hi, I'm running a 5 node Cassandra cluster and am having a very tough time getting reasonable performance from it. Many of the requests are failing with TimeOutException. This is making it difficult to use Cassandra in a production setting. The cluster was running fine for a week or two (it was cr

Re: OOM on restart

2010-02-12 Thread Anthony Molinaro
Also, is it okay to set MemtableSizeInMB in lower and restart? And what is the reasoning behind lowering object count? Isn't it whichever hits first which causes flushing? Or is there other memory used when object count is higher? One other thing, is there other hidden memory (in cache, index, t

Re: OOM on restart

2010-02-12 Thread Anthony Molinaro
Okay, is it better written as NumMemtables = 1 + 2 * AvailableProcessors + NumDataFileDirectory Thus estimated maximum memory is MemtableMemoryUsage = MemtableSizeInMB * NumMemtables So in my case where I have 2 core machines with 3 datafile directories MemtableMemoryUsage = 512 * (1 + 2 * 2 +

Re: OOM on restart

2010-02-12 Thread Jonathan Ellis
0.5 allows 1 + 2 * Runtime.getRuntime().availableProcessors() Memtables + 1 per DataFileLocation to be waiting for flush before it will block writes (or log replay) to give those time to flush out. So, it sounds like you just need to lower your Memtable max size/object count. On Fri, Feb 12, 2010

Re: OOM on restart

2010-02-12 Thread Anthony Molinaro
0.5.0 final. I was able to get things going again by upping the memory then lowering it after a successful restart, but I would like to know how to minimize the chances of OOM via tuning. -Anthony On Thu, Feb 11, 2010 at 01:29:16PM -0600, Jonathan Ellis wrote: > What version are you on these day

Re: Bootstrap hung

2010-02-12 Thread ruslan usifov
Also i have problem with StreamInitiateVerbHandler, the problem in PendingFile.getTargetFile, namely difference in slashes on win and unix, so i change PendingFile.java like this: public PendingFile(String targetFile, long expectedBytes, String table) { targetFile_ = targetFile.rep

Re: Bootstrap hung

2010-02-12 Thread Jonathan Ellis
Care to include a stack trace? Those are useful when reporting problems. On Fri, Feb 12, 2010 at 2:31 PM, ruslan usifov wrote: > Yes > > >

Re: Bootstrap hung

2010-02-12 Thread ruslan usifov
Yes

Re: Bootstrap hung

2010-02-12 Thread Jonathan Ellis
Does 64MB log an exception? On Fri, Feb 12, 2010 at 10:45 AM, ruslan usifov wrote: > My first node is ander win32, and there is problem in FileStreamTask.java > here: > > public static final int CHUNK_SIZE = 64*1024*1024; > > > On windows this buffer is very big so stream method of class FileStre

Re: Bootstrap hung

2010-02-12 Thread ruslan usifov
My first node is ander win32, and there is problem in FileStreamTask.java here: public static final int CHUNK_SIZE = 64*1024*1024; On windows this buffer is very big so stream method of class FileStreamTask always fail, i reduce buffer to 32MB, and this works in my case. 2010/2/12 ruslan usifo

Re: Bootstrap hung

2010-02-12 Thread Gary Dusbabek
If the processes are still active, could you please post a thread dump of them? You can do this by sending the java process a kill -3 (assuming freebsd is similar enough to linux). Gary On Fri, Feb 12, 2010 at 04:47, ruslan usifov wrote: > Hello > > I have two nodes. First i on only one node an

Re: Bootstrap hung

2010-02-12 Thread Jonathan Ellis
it is probably waiting for 192.168.0.37 to split off the right data for it. send that command to 192.168.0.37. -Jonathan On Fri, Feb 12, 2010 at 4:47 AM, ruslan usifov wrote: > Hello > > I have two nodes. First i on only one node and populate it with records. > Then i start second node in boots

Bootstrap hung

2010-02-12 Thread ruslan usifov
Hello I have two nodes. First i on only one node and populate it with records. Then i start second node in bootstrap mode. And it hung. I run ./nodetool many times but it allways print identical results (67108864/240394603) Nothing to grows. And CPU on bootstrap node on 100% freebsd# ./nodetool -