You're right, it's probably hard. I should have provided more data.

I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the
log indicates that JNA is working, please correct me if I'm wrong:
CLibrary.java (line 111) JNA mlockall successful

Total amount of RAM is 4GB.

My description of data size was very bad. Sorry about that. Data set size
is 12.3 GB per node, compressed.

Heap size is 998.44MB according to nodetool info.
Key cache is 49MB bytes according to nodetool info.
Row cache size is 0 bytes acoording to nodetool info.
Max new heap is 205MB kbytes according to Memory Pool "Par Eden Space" max
in jconsole.
Memtable is left at default which should give it 333MB according to
documentation (uncertain where I can verify this).

Our production cluster seems similar to your dev cluster so possibly
increasing the heap to 2GB might help our issues.

I am still interested in getting rough estimates of how much heap will be
needed as data grows. Other than empirical studies how would I go about
getting such estimates?


2013/4/16 Viktor Jevdokimov <viktor.jevdoki...@adform.com>

>  How one could provide any help without any knowledge about your cluster,
> node and environment settings?****
>
> ** **
>
> 40GB was calculated from 2 nodes with RF=2 (each has 100% data range),
> 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any
> overhead (sstable, bloom filters and indexes).****
>
> ** **
>
> With ParNew GC time such as yours even if it is a swapping issue I could
> say only that heap size is too small.****
>
> ** **
>
> Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is
> JNA installed and used? What is total amount of RAM?****
>
> ** **
>
> Just for a DEV environment we use 3 virtual machines with 4GB RAM and use
> 2GB heap without any GC issue with amount of data from 0 to 16GB compressed
> on each node. Memtable space sized to 100MB, New Heap 400MB.****
>
> ** **
>    Best regards / Pagarbiai
> *Viktor Jevdokimov*
> Senior Developer
>
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider <http://twitter.com/#!/adforminsider>
> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>  [image: Adform News] <http://www.adform.com>
> [image: Adform awarded the Best Employer 2012]
> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>   *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com]
> *Sent:* Tuesday, April 16, 2013 12:52
> *To:* user@cassandra.apache.org
> *Subject:* Re: Reduce Cassandra GC****
>
> ** **
>
> How do you calculate the heap / data size ratio? Is this a linear ratio?**
> **
>
> ** **
>
> Each node has slightly more than 12 GB right now though.****
>
> ** **
>
> 2013/4/16 Viktor Jevdokimov <viktor.jevdoki...@adform.com>****
>
> For a >40GB of data 1GB of heap is too low.****
>
>  ****
>
> Best regards / Pagarbiai****
>
> *Viktor Jevdokimov*****
>
> Senior Developer****
>
> ** **
>
> Email: viktor.jevdoki...@adform.com****
>
> Phone: +370 5 212 3063, Fax +370 5 261 0453****
>
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania****
>
> Follow us on Twitter: @adforminsider <http://twitter.com/#!/adforminsider>
> ****
>
> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
> ****
>
> [image: Adform News] <http://www.adform.com>****
>
> [image: Adform awarded the Best Employer 
> 2012]<http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
> ****
>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies. ****
>
> ** **
>
> *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com]
> *Sent:* Tuesday, April 16, 2013 10:47
> *To:* user@cassandra.apache.org
> *Subject:* Reduce Cassandra GC****
>
>  ****
>
> Hi,****
>
>  ****
>
> We have a small production cluster with two nodes. The load on the nodes
> is very small, around 20 reads / sec and about the same for writes. There
> are around 2.5 million keys in the cluster and a RF of 2.****
>
>  ****
>
> About 2.4 million of the rows are skinny (6 columns) and around 3kb in
> size (each). Currently, scripts are running, accessing all of the keys in
> timeorder to do some calculations.****
>
>  ****
>
> While running the scripts, the nodes go down and then come back up 6-7
> minutes later. This seems to be due to GC. I get lines like this in the log:
> ****
>
> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
> 1046937600****
>
>  ****
>
> However, the heap is not full. The heap usage has a jagged pattern going
> from 60% up to 70% during 5 minutes and then back down to 60% the next 5
> minutes and so on. I get no "Heap is X full..." messages. Every once in a
> while at one of these peaks, I get these stop-the-world GC for 6-7
> minutes. Why does GC take up so much time even though the heap isn't full?
> ****
>
>  ****
>
> I am aware that my access patterns make key caching very unlikely to be
> high. And indeed, my average key cache hit ratio during the run of the
> scripts is around 0.5%. I tried disabling key caching on the accessed
> column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the
> cassandra-cli but I get the same behaviour. Is the turning key cache off
> effective immediately?****
>
>  ****
>
> Stop-the-world GC is fine if it happens for a few seconds but having them
> for several minutes doesn't work. Any other suggestions to remove them?***
> *
>
>  ****
>
> Best regards,****
>
> Joel Samuelsson****
>
> ** **
>

<<signature-best-employer-logo4823.png>>

<<signature-logo29.png>>

<<image002.png>>

<<image001.png>>

Reply via email to