Cluster sizing for huge dataset

DuyHai Doan Sat, 28 Sep 2019 13:23:28 -0700

Hello users

I'm facing with a very challenging exercise: size a cluster with a huge dataset.


Use-case = IoT

Number of sensors: 30 millions
Frequency of data: every 10 minutes
Estimate size of a data: 100 bytes (including clustering columns)
Data retention: 2 years
Replication factor: 3 (pretty standard)

A very quick math gives me:

6 data points / hour * 24 * 365 ~50 000 data points/ year/ sensor

In term of size, it is 50 000 x 100 bytes = 5Mb worth of data /year /sensor

Now the big problem is that we have 30 millions of sensor so the disk
requirements adds up pretty fast: 5 Mb * 30 000 000 = 5Tb * 30 = 150Tb
worth of data/year

We want to store data for 2 years => 300Tb

We have RF=3 ==> 900Tb !!!!

Now, according to commonly recommended density (with SSD), one shall
not exceed 2Tb of data per node, which give us a rough sizing of 450
nodes cluster !!!

Even if we push the limit up to 10Tb using TWCS (has anyone tried this
?) We would still need 90 beefy nodes to support this.

Any thoughts/ideas to reduce the nodes count or increase density and
keep the cluster manageable ?

Regards

Duy Hai DOAN

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Cluster sizing for huge dataset

Reply via email to