Data growth vs Cluster Size planning

Aakash Basu Mon, 11 Feb 2019 01:49:31 -0800

Hi,

I ran a dataset of *200 columns and 0.2M records* in a cluster of *1 master
18 GB, 2 slaves 32 GB each, **16 cores/slave*, took around *772 minutes*
for a *very large ML tuning based job* (training).


Now, my requirement is to run the *same operation on 3M records*. Any idea
on how we should proceed? Should we go for a vertical scaling or a
horizontal one? How should this problem be approached in a
stepwise/systematic manner?

Thanks in advance.

Regards,
Aakash.

Data growth vs Cluster Size planning

Reply via email to