Re: Occasional 10s Timeouts on Read

2010-06-24 Thread Jonathan Ellis
Glad you tracked that down! On Wed, Jun 23, 2010 at 6:14 PM, AJ Slater a...@zuno.com wrote: This issue is caused by my network. Cassandra maintains multiple gossip connections per node pair. One of these connections is used for heartbeat and load broadcasting traffic. Its quite talky.

Re: Occasional 10s Timeouts on Read

2010-06-19 Thread AJ Slater
I shall do just that. I did a bunch of tests this morning and the situation appears to be this: I have three nodes A, B and C, with RF=2. I understand now why this issue wasn't apparent with RF=3. If there are regular intranode column requests going on (e.g. i set up a pinger to get remote

Re: Occasional 10s Timeouts on Read

2010-06-19 Thread Peter Schuller
TRACE 14:42:06,248 unable to connect to /10.33.3.20 java.net.ConnectException: Connection refused        at java.net.PlainSocketImpl.socketConnect(Native Method) So that's interesting since it is a clear failure that comes from the operating system and indicates something which can be observed

Re: Occasional 10s Timeouts on Read

2010-06-19 Thread AJ Slater
The only indication I have that cassandra realized something was wrong during this period was this INFO message: 10.33.2.70:/var/log/cassandra/output.log DEBUG 20:00:35,841 get_slice DEBUG 20:00:35,841 weakreadremote reading SliceFromReadCommand(table='jolitics.c om',

Re: Occasional 10s Timeouts on Read

2010-06-18 Thread AJ Slater
To summarize: If a request for a column comes in *after a period of several hours with no requests*, then the node servicing the request hangs while looking for its peer rather than servicing the request like it should. It then throws either a TimedOutException or a (wrong) NotFoundExeption. And

Occasional 10s Timeouts on Read

2010-06-17 Thread AJ Slater
I'm seing 10s timeouts on reads few times a day. Its hard to reproduce consistently but seems to happen most often after its been a long time between reads. After presenting itself for a couple minutes the problem then goes away. I've got a three node cluster with replication factor 2, reading at

Re: Occasional 10s Timeouts on Read

2010-06-17 Thread AJ Slater
Cassandra 0.6.2 from the apache debian source. Ubunutu Jaunty. Sun Java6 jvm. All nodes in separate racks at 365 main. On Thu, Jun 17, 2010 at 10:12 AM, AJ Slater a...@zuno.com wrote: I'm seing 10s timeouts on reads few times a day. Its hard to reproduce consistently but seems to happen most

Re: Occasional 10s Timeouts on Read

2010-06-17 Thread AJ Slater
Total data size in the entire cluster is about twenty 12k images. With no other load on the system. I just ask for one column and I get these timeouts. Performing multiple gets on the columns leads to multiple timeouts for a period of a few seconds or minutes and then the situation magically

Re: Occasional 10s Timeouts on Read

2010-06-17 Thread aaron morton
Do you have Row Caching enabled ? You can check in the JMX console to see if you're hitting the cache. Try turning on DEBUG level logging and look at the log on a machine you connect to do the read. Aaron On 18 Jun 2010, at 05:31, AJ Slater wrote: Total data size in the entire cluster

Re: Occasional 10s Timeouts on Read

2010-06-17 Thread Jonathan Ellis
The explanation that best fits the symptoms you describe is that you are swapping. On Thu, Jun 17, 2010 at 10:12 AM, AJ Slater a...@zuno.com wrote: I'm seing 10s timeouts on reads few times a day. Its hard to reproduce consistently but seems to happen most often after its been a long time

Re: Occasional 10s Timeouts on Read

2010-06-17 Thread Benjamin Black
Are these physical machines or virtuals? Did you post your cassandra.in.sh and storage-conf.xml someplace? On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater a...@zuno.com wrote: Total data size in the entire cluster is about twenty 12k images. With no other load on the system. I just ask for one

Re: Occasional 10s Timeouts on Read

2010-06-17 Thread AJ Slater
The behavior was seen with row caching off. I now have row caching on. key cache hit rate is 0.75-0.8 row cache hit rate is 0 (row cache capacity=1, RowsCached=100%) looks like I should try another format for RowsCached, like 0.8 or 90% or something. On Thu, Jun 17, 2010 at 1:47 PM, aaron

Re: Occasional 10s Timeouts on Read

2010-06-17 Thread AJ Slater
The machines in question have 8GB of RAM each and generally never touch swap. I shall try to monitor memory/swap overnight and see if something strange happens. Would swapping really take 10s? AJ On Thu, Jun 17, 2010 at 1:54 PM, Jonathan Ellis jbel...@gmail.com wrote: The explanation that best

Re: Occasional 10s Timeouts on Read

2010-06-17 Thread AJ Slater
These are physical machines. storage-conf.xml.fs03 is here: http://pastebin.com/weL41NB1 Diffs from that for the other two storage-confs are inline here: a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03 storage-conf.xml.fs01 185c185