Hello Experts
I am trying to maximise the resource utilisation on my 3 node spark cluster
(2 data nodes and 1 driver) so that the job finishes quickest. I am trying
to create a benchmark so I can recommend an optimal POD for the job
128GB x 16 cores
I have standalone spark running 2.4.0
HTOP shows
Hi all:
i want to ask a question about how to estimate the rdd size( according to
byte) when it is not saved to disk because the job spends long time if the
output is very huge and output partition number is small.
following step is what i can solve for this problem
1.sample 0.01 's or