Hello Mark and thanks for you reply. 1) i store is as UTF8String.All digits are from 0x30 to 0x39 and should take 1 byte each digit. Since all characters are digits it should have 15 bytes. 2) I will change the data i am storing to decrease the usage , in value i will find some small value to store.Previously i used same value since this table is index only for search purposed and does not really has value. 3) You are right i read and write in quorum and it was my mistake ( i though that if i write in quorum then data will be written to 2 nodes only). If i check the keyspace create keyspace USER_DATA with placement_strategy = 'NetworkTopologyStrategy' and strategy_options = [{19 : 3}] and durable_writes = true;
it has replication factor of 3. Therefore i have several questions 1) What should be recommended read and write consistency and replication factor for 3 nodes with option of future increase server numbers? 2) Still it has 1.5X of overall data how can this be resolved and what is reason for that? 3) Also i see that data is in different size on all nodes , does that means that servers are out of sync??? Thanks and best regards Yulian Oifa On Sun, Apr 13, 2014 at 7:03 PM, Mark Reddy <mark.re...@boxever.com> wrote: > What are you storing these 15 chars as; string, int, double, etc.? 15 > chars does not translate to 15 bytes. > > You may be mixing up replication factor and quorum when you say "Cassandra > cluster has 3 servers, and data is stored in quorum ( 2 servers )." You > read and write at quorum (N/2)+1 where N=total_number_of_nodes and your > data is replicated to the number of nodes you specify in your replication > factor. Could you clarify? > > Also if you are concerned about disk usage, why are you storing the same > 15 char value in both the column name and value? You could just store it as > the name and half your data usage :) > > > > > On Sun, Apr 13, 2014 at 4:26 PM, Yulian Oifa <oifa.yul...@gmail.com>wrote: > >> I have column family with 2 raws. >> 2 raws have overall 100 million columns. >> Each columns have name of 15 chars ( digits ) and same 15 chars in value >> ( also digits ). >> Each column should have 30 bytes. >> Therefore all data should contain approximately 3GB. >> Cassandra cluster has 3 servers , and data is stored in quorum ( 2 >> servers ). >> Therefore each server should have 3GB*2/3=2GB of data for this column >> family. >> Table is almost never changed , data is only removed from this table , >> which possibly created tombstones , but it should not increase the usage. >> However when i check the data i see that each server has more then 4GB of >> data ( more then twice of what should be). >> >> server 1: >> -rw-r--r-- 1 root root 3506446057 Dec 26 12:02 freeNumbers-g-264-Data.db >> -rw-r--r-- 1 root root 814699666 Dec 26 12:24 freeNumbers-g-281-Data.db >> -rw-r--r-- 1 root root 198432466 Dec 26 12:27 freeNumbers-g-284-Data.db >> -rw-r--r-- 1 root root 35883918 Apr 12 20:07 freeNumbers-g-336-Data.db >> >> server 2: >> -rw-r--r-- 1 root root 3448432307 Dec 26 11:57 freeNumbers-g-285-Data.db >> -rw-r--r-- 1 root root 762399716 Dec 26 12:22 freeNumbers-g-301-Data.db >> -rw-r--r-- 1 root root 220887062 Dec 26 12:23 freeNumbers-g-304-Data.db >> -rw-r--r-- 1 root root 54914466 Dec 26 12:26 freeNumbers-g-306-Data.db >> -rw-r--r-- 1 root root 53639516 Dec 26 12:26 freeNumbers-g-305-Data.db >> -rw-r--r-- 1 root root 53007967 Jan 8 15:45 freeNumbers-g-314-Data.db >> -rw-r--r-- 1 root root 413717 Apr 12 18:33 freeNumbers-g-359-Data.db >> >> >> server 3: >> -rw-r--r-- 1 root root 4490657264 Apr 11 18:20 freeNumbers-g-358-Data.db >> -rw-r--r-- 1 root root 389171 Apr 12 20:58 freeNumbers-g-360-Data.db >> -rw-r--r-- 1 root root 4276 Apr 11 18:20 >> freeNumbers-g-358-Statistics.db >> -rw-r--r-- 1 root root 4276 Apr 11 18:24 >> freeNumbers-g-359-Statistics.db >> -rw-r--r-- 1 root root 4276 Apr 12 20:58 >> freeNumbers-g-360-Statistics.db >> -rw-r--r-- 1 root root 976 Apr 11 18:20 freeNumbers-g-358-Filter.db >> -rw-r--r-- 1 root root 208 Apr 11 18:24 freeNumbers-g-359-Data.db >> -rw-r--r-- 1 root root 78 Apr 11 18:20 freeNumbers-g-358-Index.db >> -rw-r--r-- 1 root root 52 Apr 11 18:24 freeNumbers-g-359-Index.db >> -rw-r--r-- 1 root root 52 Apr 12 20:58 freeNumbers-g-360-Index.db >> -rw-r--r-- 1 root root 16 Apr 11 18:24 freeNumbers-g-359-Filter.db >> -rw-r--r-- 1 root root 16 Apr 12 20:58 freeNumbers-g-360-Filter.db >> >> When i try to compact i get the following notification from first server : >> INFO [CompactionExecutor:1604] 2014-04-13 18:23:07,260 >> CompactionController.java (line 146) Compacting large row >> USER_DATA/freeNumbers:8bdf9678-6d70-11e3-85ab-80e385abf85d (4555076689 >> bytes) incrementally >> >> Which confirms that there is around 4.5GB of data on that server only. >> Why does cassandra takes so much data??? >> >> Best regards >> Yulian Oifa >> >> >