Re: Cassandra and disk space

Rustam Aliyev Thu, 09 Dec 2010 15:52:55 -0800

That depends on your scenario. In the worst case of one big CF,there's not much that can be easily done for the disk usage ofcompaction and cleanup (which is essentially compaction).
If, instead, you have several column families and no single CF makesup the majority of your data, you can push your disk usage a bit higher.

Is there any formula to calculate this? Let's say I have 500GB in singleCF. So I need at least 500GB of free space for compaction. If Ipartition this CF and split it into 10 proportional CFs each 50GB, doesit mean that I will need only 50GB of free space?


Also, is there recommended maximum of data size per node?

Thanks.

A fundamental idea behind Cassandra's architecture is that disk spaceis cheap (which, indeed, it is). If you are particularly sensitive tothis, Cassandra might not be the best solution to your problem. Alsokeep in mind that Cassandra performs well with average disks, so youdon't need to spend a lot there. Additionally, most people find thatthe replication protects their data enough to allow them to use RAID 0instead of 1, 10, 5, or 6.


- Tyler

On Thu, Dec 9, 2010 at 12:20 PM, Rustam Aliyev <rus...@code.az<mailto:rus...@code.az>> wrote:


    Is there any plans to improve this in future?

    For big data clusters this could be very expensive. Based on your
    comment, I will need 200TB of storage for 100TB of data to keep
    Cassandra running.

    --
    Rustam.

    On 09/12/2010 17:56, Tyler Hobbs wrote:

    If you are on 0.6, repair is particularly dangerous with respect
    to disk space usage.  If your replica is sufficiently out of
    sync, you can triple your disk usage pretty easily.  This has
    been improved in 0.7, so repairs should use about half as much
    disk space, on average.

    In general, yes, keep your nodes under 50% disk usage at all
    times.  Any of: compaction, cleanup, snapshotting, repair, or
    bootstrapping (the latter two are improved in 0.7) can double
    your disk usage temporarily.

    You should plan to add more disk space or add nodes when you get
    close to this limit.  Once you go over 50%, it's more difficult
    to add nodes, at least in 0.6.

    - Tyler

    On Thu, Dec 9, 2010 at 11:19 AM, Mark <static.void....@gmail.com
    <mailto:static.void....@gmail.com>> wrote:

        I recently ran into a problem during a repair operation where
        my nodes completely ran out of space and my whole cluster
        was... well, clusterfucked.

        I want to make sure how to prevent this problem in the future.

        Should I make sure that at all times every node is under 50%
        of its disk space? Are there any normal day-to-day operations
        that would cause the any one node to double in size that I
        should be aware of? If on or more nodes to surpass the 50%
        mark, what should I plan to do?

        Thanks for any advice

Re: Cassandra and disk space

Reply via email to