Hi Marcelo, No pending write tasks, I am writing a lot, about 100-200 writes each up to 100Kb every 15[s]. It is running on decent cluster of 5 identical nodes, quad cores i7 with 32Gb RAM and 480Gb SSD.
Regards, Pavel On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br> wrote: > Pavel, > > In my case, the heap was filling up faster than it was draining. I am > still looking for the cause of it, as I could drain really fast with SSD. > > However, in your case you could check (AFAIK) nodetool tpstats and see if > there are too many pending write tasks, for instance. Maybe you really are > writting more than the nodes are able to flush to disk. > > How many writes per second are you achieving? > > Also, I would look for GCInspector in the log: > > cat system.log* | grep GCInspector | wc -l > tail -1000 system.log | grep GCInspector > > Do you see it running a lot? Is it taking much more time to run each time > it runs? > > I am no Cassandra expert, but I would try these things first and post the > results here. Maybe other people in the list have more ideas. > > Best regards, > Marcelo. > > > 2014-06-20 8:50 GMT-03:00 Pavel Kogan <pavel.ko...@cortica.com>: > > The cluster is new, so no updates were done. Version 2.0.8. >> It happened when I did many writes (no reads). Writes are done in small >> batches of 2 inserts (writing to 2 column families). The values are big >> blobs (up to 100Kb). >> >> Any clues? >> >> Pavel >> >> >> On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle < >> marc...@s1mbi0se.com.br> wrote: >> >>> Pavel, >>> >>> Out of curiosity, did it start to happen before some update? Which >>> version of Cassandra are you using? >>> >>> []s >>> >>> >>> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pavel.ko...@cortica.com>: >>> >>>> What a coincidence! Today happened in my cluster of 7 nodes as well. >>>> >>>> Regards, >>>> Pavel >>>> >>>> >>>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle < >>>> marc...@s1mbi0se.com.br> wrote: >>>> >>>>> I have a 10 node cluster with cassandra 2.0.8. >>>>> >>>>> I am taking this exceptions in the log when I run my code. What my >>>>> code does is just reading data from a CF and in some cases it writes new >>>>> data. >>>>> >>>>> WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391 >>>>> BatchStatement.java (line 228) Batch of prepared statements for >>>>> [identification1.entity, identification1.entity_lookup] is of size 6165, >>>>> exceeding specified threshold of 5120 by 1045. >>>>> WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152 >>>>> BatchStatement.java (line 228) Batch of prepared statements for >>>>> [identification1.entity, identification1.entity_lookup] is of size 21266, >>>>> exceeding specified threshold of 5120 by 16146. >>>>> WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229 >>>>> BatchStatement.java (line 228) Batch of prepared statements for >>>>> [identification1.entity, identification1.entity_lookup] is of size 22978, >>>>> exceeding specified threshold of 5120 by 17858. >>>>> INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481) >>>>> CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is >>>>> 14.249755859375 (just-counted was 9.85302734375). calculation took 3ms >>>>> for >>>>> 1024 cells >>>>> >>>>> After some time, one node of the cluster goes down. Then it goes back >>>>> after some seconds and another node goes down. It keeps happening and >>>>> there >>>>> is always a node down in the cluster, when it goes back another one falls. >>>>> >>>>> The only exceptions I see in the log is "connected reset by the peer", >>>>> which seems to be relative to gossip protocol, when a node goes down. >>>>> >>>>> Any hint of what could I do to investigate this problem further? >>>>> >>>>> Best regards, >>>>> Marcelo Valle. >>>>> >>>> >>>> >>> >> >