Hi All Sorry, earlier, I forgot to set the subject line correctly > Hello Experts > I am trying to maximise the resource utilisation on my 3 node spark > cluster (2 data nodes and 1 driver) so that the job finishes quickest. I am > trying to create a benchmark so I can recommend an optimal POD for the job > 128GB x 16 cores > I have standalone spark running 2.4.0 > HTOP shows only half of the memory is in use. So what will be alternatives > I can try? CPU is always 100 % for the allocated resources > I can reduce per executor memory to 32 GB and increase number of > executors? > I have the following properties: > > spark.driver.maxResultSize 64g > spark.driver.memory 100g > spark.driver.port 33631 > spark.dynamicAllocation.enabled true > spark.dynamicAllocation.executorIdleTimeout 60s > spark.executor.cores 8 > spark.executor.id driver > spark.executor.instances 4 > spark.executor.memory 64g > spark.files file://dist/xxxx-0.0.1-py3.7.egg > spark.locality.wait 10s > > 100 > spark.shuffle.service.enabled true > > On Fri, Dec 20, 2019 at 10:56 AM zhangliyun <kelly...@126.com> wrote: > >> Hi all: >> i want to ask a question about how to estimate the rdd size( according >> to byte) when it is not saved to disk because the job spends long time if >> the output is very huge and output partition number is small. >> >> >> following step is what i can solve for this problem >> >> 1.sample 0.01 's original data >> >> 2.compute sample data count >> >> 3. if sample data count >0, cache the sample data and compute sample >> data size >> >> 4.compute original rdd total count >> >> 5.estimate the rdd size as ${total count}* ${sampel data size} / >> ${sample rdd count} >> >> The code is here >> <https://github.com/kellyzly/sparkcode/blob/master/EstimateDataSetSize.scala#L24> >> . >> >> My question >> 1. can i use above way to solve the problem? If can not, where is wrong? >> 2. Is there any existed solution ( existed API in spark) to solve the >> problem? >> >> >> >> Best Regards >> Kelly Zhang >> >> >> >> > > > -- > -Sriram >
-- -Sriram