data size difference between supercolumn and regular column

2012-03-28 Thread Yiming Sun
Hi, We are trying to estimate the amount of storage we need for a production cassandra cluster. While I was doing the calculation, I noticed a very dramatic difference in terms of storage space used by cassandra data files. Our previous setup consists of a single-node cassandra 0.8.x with no rep

Re: data size difference between supercolumn and regular column

2012-03-28 Thread Yiming Sun
Actually, after I read an article on cassandra 1.0 compression just now ( http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression), I am more puzzled. In our schema, we didn't specify any compression options -- does cassandra 1.0 perform some default compression? or is the data red

Re: data size difference between supercolumn and regular column

2012-03-31 Thread aaron morton
> does cassandra 1.0 perform some default compression? No. The on disk size depends to some degree on the work load. If there are a lot of overwrites or deleted you may have rows/columns that need to be compacted. You may have some big old SSTables that have not been compacted for a while.

Re: data size difference between supercolumn and regular column

2012-04-01 Thread Yiming Sun
Thanks Aaron. Well I guess it is possible the data files from sueprcolumns could've been reduced in size after compaction. This bring yet another question. Say I am on a shoestring budget and can only put together a cluster with very limited storage space. The first iteration of pushing data in

Re: data size difference between supercolumn and regular column

2012-04-01 Thread Jeremiah Jordan
Is that 80% with compression? If not, the first thing to do is turn on compression. Cassandra doesn't behave well when it runs out of disk space. You really want to try and stay around 50%, 60-70% works, but only if it is spread across multiple column families, and even then you can run into

Re: data size difference between supercolumn and regular column

2012-04-02 Thread Yiming Sun
Yup Jeremiah, I learned a hard lesson on how cassandra behaves when it runs out of disk space :-S.I didn't try the compression, but when it ran out of disk space, or near running out, compaction would fail because it needs space to create some tmp data files. I shall get a tatoo that says keep

Re: data size difference between supercolumn and regular column

2012-04-02 Thread aaron morton
If you have a workload with overwrites you will end up with some data needing compaction. Running a nightly manual compaction would remove this, but it will also soak up some IO so it may not be the best solution. I do not know if Leveled compaction would result in a smaller disk load for the

Re: data size difference between supercolumn and regular column

2012-04-03 Thread Tamar Fraenkel
Do you have a good reference for maintenance scripts for Cassandra ring? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Tue, Apr 3, 2012 at 4:37 AM, aaron morton wr

Re: data size difference between supercolumn and regular column

2012-04-04 Thread Yiming Sun
Cool, I will look into this new leveled compaction strategy and give it a try. BTW, Aaron, I think the last word of your message meant to say "compression", correct? -- Y. On Mon, Apr 2, 2012 at 9:37 PM, aaron morton wrote: > If you have a workload with overwrites you will end up with some data

Re: data size difference between supercolumn and regular column

2012-04-04 Thread Watanabe Maki
LeveledCompaction will use less disk space(load), but need more IO. If your traffic is too high for your disk, you will have many pending compaction tasks, and large number of sstables which wait to be compacted. Also the default sstable_size_in_mb (5MB) will be too small for large data set. You

Re: data size difference between supercolumn and regular column

2012-04-06 Thread Yiming Sun
Thanks for the advice, Maki, especially on the ulimit! Yes, we will play with the configuration and figure out some optimal sstable size. -- Y. On Wed, Apr 4, 2012 at 9:49 AM, Watanabe Maki wrote: > LeveledCompaction will use less disk space(load), but need more IO. > If your traffic is too hig