Logged batch.

On Fri, Jun 20, 2014 at 2:13 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> I think some figures from "nodetool tpstats" and "nodetool
> compactionstats" may help seeing clearer
>
> And Pavel, when you said batch, did you mean LOGGED batch or UNLOGGED
> batch ?
>
>
>
>
>
> On Fri, Jun 20, 2014 at 8:02 PM, Marcelo Elias Del Valle <
> marc...@s1mbi0se.com.br> wrote:
>
>> If you have 32 Gb RAM, the heap is probably 8Gb.
>> 200 writes of 100 kb / s would be 20MB / s in the worst case, supposing
>> all writes of a replica goes to a single node.
>> I really don't see any reason why it should be filling up the heap.
>> Anyone else?
>>
>> But did you check the logs for the GCInspector?
>> In my case, nodes are falling because of the heap, in your case, maybe
>> it's something else.
>> Do you see increased times when looking for GCInspector in the logs?
>>
>> []s
>>
>>
>>
>> 2014-06-20 14:51 GMT-03:00 Pavel Kogan <pavel.ko...@cortica.com>:
>>
>> Hi Marcelo,
>>>
>>> No pending write tasks, I am writing a lot, about 100-200 writes each up
>>> to 100Kb every 15[s].
>>> It is running on decent cluster of 5 identical nodes, quad cores i7 with
>>> 32Gb RAM and 480Gb SSD.
>>>
>>> Regards,
>>>   Pavel
>>>
>>>
>>> On Fri, Jun 20, 2014 at 12:31 PM, Marcelo Elias Del Valle <
>>> marc...@s1mbi0se.com.br> wrote:
>>>
>>>> Pavel,
>>>>
>>>> In my case, the heap was filling up faster than it was draining. I am
>>>> still looking for the cause of it, as I could drain really fast with SSD.
>>>>
>>>> However, in your case you could check (AFAIK) nodetool tpstats and see
>>>> if there are too many pending write tasks, for instance. Maybe you really
>>>> are writting more than the nodes are able to flush to disk.
>>>>
>>>> How many writes per second are you achieving?
>>>>
>>>> Also, I would look for GCInspector in the log:
>>>>
>>>> cat system.log* | grep GCInspector | wc -l
>>>> tail -1000 system.log | grep GCInspector
>>>>
>>>> Do you see it running a lot? Is it taking much more time to run each
>>>> time it runs?
>>>>
>>>> I am no Cassandra expert, but I would try these things first and post
>>>> the results here. Maybe other people in the list have more ideas.
>>>>
>>>> Best regards,
>>>> Marcelo.
>>>>
>>>>
>>>> 2014-06-20 8:50 GMT-03:00 Pavel Kogan <pavel.ko...@cortica.com>:
>>>>
>>>> The cluster is new, so no updates were done. Version 2.0.8.
>>>>> It happened when I did many writes (no reads). Writes are done in
>>>>> small batches of 2 inserts (writing to 2 column families). The values are
>>>>> big blobs (up to 100Kb).
>>>>>
>>>>> Any clues?
>>>>>
>>>>> Pavel
>>>>>
>>>>>
>>>>> On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle <
>>>>> marc...@s1mbi0se.com.br> wrote:
>>>>>
>>>>>> Pavel,
>>>>>>
>>>>>> Out of curiosity, did it start to happen before some update? Which
>>>>>> version of Cassandra are you using?
>>>>>>
>>>>>> []s
>>>>>>
>>>>>>
>>>>>> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pavel.ko...@cortica.com>:
>>>>>>
>>>>>>> What a coincidence! Today happened in my cluster of 7 nodes as well.
>>>>>>>
>>>>>>> Regards,
>>>>>>>   Pavel
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
>>>>>>> marc...@s1mbi0se.com.br> wrote:
>>>>>>>
>>>>>>>> I have a 10 node cluster with cassandra 2.0.8.
>>>>>>>>
>>>>>>>> I am taking this exceptions in the log when I run my code. What my
>>>>>>>> code does is just reading data from a CF and in some cases it writes 
>>>>>>>> new
>>>>>>>> data.
>>>>>>>>
>>>>>>>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 
>>>>>>>> 6165,
>>>>>>>> exceeding specified threshold of 5120 by 1045.
>>>>>>>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 
>>>>>>>> 21266,
>>>>>>>> exceeding specified threshold of 5120 by 16146.
>>>>>>>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>>>>>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>>>>>> [identification1.entity, identification1.entity_lookup] is of size 
>>>>>>>> 22978,
>>>>>>>> exceeding specified threshold of 5120 by 17858.
>>>>>>>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line
>>>>>>>> 481) CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>>>>>>>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 
>>>>>>>> 3ms for
>>>>>>>> 1024 cells
>>>>>>>>
>>>>>>>> After some time, one node of the cluster goes down. Then it goes
>>>>>>>> back after some seconds and another node goes down. It keeps happening 
>>>>>>>> and
>>>>>>>> there is always a node down in the cluster, when it goes back another 
>>>>>>>> one
>>>>>>>> falls.
>>>>>>>>
>>>>>>>> The only exceptions I see in the log is "connected reset by the
>>>>>>>> peer", which seems to be relative to gossip protocol, when a node goes 
>>>>>>>> down.
>>>>>>>>
>>>>>>>> Any hint of what could I do to investigate this problem further?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Marcelo Valle.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to