Hi, I ran a dataset of *200 columns and 0.2M records* in a cluster of *1 master 18 GB, 2 slaves 32 GB each, **16 cores/slave*, took around *772 minutes* for a *very large ML tuning based job* (training).
Now, my requirement is to run the *same operation on 3M records*. Any idea on how we should proceed? Should we go for a vertical scaling or a horizontal one? How should this problem be approached in a stepwise/systematic manner? Thanks in advance. Regards, Aakash.