Re: How to estimate the rdd size before the rdd result is written to disk

2019-12-19 Thread Sriram Bhamidipati
Hello Experts I am trying to maximise the resource utilisation on my 3 node spark cluster (2 data nodes and 1 driver) so that the job finishes quickest. I am trying to create a benchmark so I can recommend an optimal POD for the job 128GB x 16 cores I have standalone spark running 2.4.0 HTOP shows

How to estimate the rdd size before the rdd result is written to disk

2019-12-19 Thread zhangliyun
Hi all: i want to ask a question about how to estimate the rdd size( according to byte) when it is not saved to disk because the job spends long time if the output is very huge and output partition number is small. following step is what i can solve for this problem 1.sample 0.01 's or