optimising cluster performance

2019-12-19 Thread Sriram Bhamidipati
Hi All Sorry, earlier, I forgot to set the subject line correctly > Hello Experts > I am trying to maximise the resource utilisation on my 3 node spark > cluster (2 data nodes and 1 driver) so that the job finishes quickest. I am > trying to create a benchmark so I can recommend an optimal POD for

Re: How to estimate the rdd size before the rdd result is written to disk

2019-12-19 Thread Sriram Bhamidipati
Hello Experts I am trying to maximise the resource utilisation on my 3 node spark cluster (2 data nodes and 1 driver) so that the job finishes quickest. I am trying to create a benchmark so I can recommend an optimal POD for the job 128GB x 16 cores I have standalone spark running 2.4.0 HTOP shows