Thank you Aaron. 8G memory is about the spec we use now for testing.

I observed a couple of other things when checked the output.log file but I think this should go to another post.

Thank you very much for your advice.

Bill


On 13/04/12 02:49, aaron morton wrote:
It depends on a lot of things: schema size, caches, work load etc.

If your are just starting out I would recommend using a machine with 8gb or 16gb total ram. By default cassandra will take about 4gb or 8gb (respectively) for the JVM.

Once you have a feel for how things work you should be able to estimate the resources your application will need.

Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/04/2012, at 2:19 AM, Vasileios Vlachos wrote:

Hello Aaron,

Thank you for getting back to me.

I will change to m1.large first to see how long it will take Cassandra node to die (if at all). If again not happy I will try more memory. I just want to test it step by step and see what the differences are. I will also change the cassandra-env file back to defaults.

Is there an absolute minimum requirement for Cassandra in terms of memory? I might be wrong, but from my understanding we shouldn't have any problems given the amount of data we store per day (currently approximately 2-2.5G / day).

Thank you in advance,

Bill


On Wed, Apr 11, 2012 at 7:33 PM, aaron morton <aa...@thelastpickle.com <mailto:aa...@thelastpickle.com>> wrote:

    'system_memory_in_mb' (3760) and the 'system_cpu_cores' (1)
    according to our nodes' specification. We also changed the
    'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think
    the second is related to the Garbage Collection).
    It's best to leave the default settings unless you know what you
    are doing here.

    In case you find this useful, swap is off and unevictable memory
    seems to be very high on all 3 servers (2.3GB, we usually
    observe the amount of unevictable memory on other Linux servers
    of around 0-16KB)
    Cassandra locks the java memory so it cannot be swapped out.

    The problem is that the node we hit from our thrift interface
    dies regularly (approximately after we store 2-2.5G of data).
    Error message: OutOfMemoryError: Java Heap Space and according
    to the log it in fact used all of the allocated memory.
    The easiest solution will be to use a larger EC2 instance.

    People normally use an m1.xlarge with 16Gb of ram (you would also
    try an m1.large).

    If you are still experimenting I would suggest using the larger
    instances so you can make some progress. Once you have a feel for
    how things work you can then try to match the instances to your
    budget.

    Hope that helps.

    -----------------
    Aaron Morton
    Freelance Developer
    @aaronmorton
    http://www.thelastpickle.com <http://www.thelastpickle.com/>

    On 11/04/2012, at 1:54 AM, Vasileios Vlachos wrote:

    Hello,

    We are experimenting a bit with Cassandra lately (version 1.0.7)
    and we seem to have some problems with memory. We use EC2 as our
    test environment and we have three nodes with 3.7G of memory and
    1 core @ 2.4G, all running Ubuntu server 11.10.

    The problem is that the node we hit from our thrift interface
    dies regularly (approximately after we store 2-2.5G of data).
    Error message: OutOfMemoryError: Java Heap Space and according
    to the log it in fact used all of the allocated memory.

    The nodes are under relatively constant load and store about
    2000-4000 row keys a minute, which are batched through the Trift
    interface in 10-30 row keys at once (with about 50 columns
    each). The number of reads is very low with around 1000-2000 a
    day and only requesting the data of a single row key. The is
    currently only one used column family.

    The initial thought was that something was wrong in the
    cassandra-env.sh file. So, we specified the variables
    'system_memory_in_mb' (3760) and the 'system_cpu_cores' (1)
    according to our nodes' specification. We also changed the
    'MAX_HEAP_SIZE' to 2G and the 'HEAP_NEWSIZE' to 200M (we think
    the second is related to the Garbage Collection). Unfortunately,
    that did not solve the issue and the node we hit via thrift
    keeps on dying regularly.

    In case you find this useful, swap is off and unevictable memory
    seems to be very high on all 3 servers (2.3GB, we usually
    observe the amount of unevictable memory on other Linux servers
    of around 0-16KB) (We are not quite sure how the unevictable
    memory ties into Cassandra, its just something we observed while
    looking into the problem). The CPU is pretty much idle the
    entire time. The heap memory is clearly being reduced once in a
    while according to nodetool, but obviously grows over the limit
    as time goes by.

    Any ideas? Thanks in advance.

    Bill





--

Kind regards,

Vasileios Vlachos

Reply via email to