Hello users I'm facing with a very challenging exercise: size a cluster with a huge dataset.
Use-case = IoT Number of sensors: 30 millions Frequency of data: every 10 minutes Estimate size of a data: 100 bytes (including clustering columns) Data retention: 2 years Replication factor: 3 (pretty standard) A very quick math gives me: 6 data points / hour * 24 * 365 ~50 000 data points/ year/ sensor In term of size, it is 50 000 x 100 bytes = 5Mb worth of data /year /sensor Now the big problem is that we have 30 millions of sensor so the disk requirements adds up pretty fast: 5 Mb * 30 000 000 = 5Tb * 30 = 150Tb worth of data/year We want to store data for 2 years => 300Tb We have RF=3 ==> 900Tb !!!! Now, according to commonly recommended density (with SSD), one shall not exceed 2Tb of data per node, which give us a rough sizing of 450 nodes cluster !!! Even if we push the limit up to 10Tb using TWCS (has anyone tried this ?) We would still need 90 beefy nodes to support this. Any thoughts/ideas to reduce the nodes count or increase density and keep the cluster manageable ? Regards Duy Hai DOAN --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org