Re: Inferring Data driven Spark parameters

Jörn Franke Tue, 03 Jul 2018 00:40:15 -0700

Don’t do this in your job. Create for different types of jobs different jobs 
and orchestrate them using oozie or similar.


> On 3. Jul 2018, at 09:34, Aakash Basu <aakash.spark....@gmail.com> wrote:
> 
> Hi,
> 
> Cluster - 5 node (1 Driver and 4 workers)
> Driver Config: 16 cores, 32 GB RAM
> Worker Config: 8 cores, 16 GB RAM
> 
> I'm using the below parameters from which I know the first chunk is cluster 
> dependent and the second chunk is data/code dependent.
> 
> --num-executors 4 
> --executor-cores 5
> --executor-memory 10G 
> --driver-cores 5 
> --driver-memory 25G 
> 
> 
> --conf spark.sql.shuffle.partitions=100 
> --conf spark.driver.maxResultSize=2G 
> --conf "spark.executor.extraJavaOptions=-XX:+UseParallelGC" 
> --conf spark.scheduler.listenerbus.eventqueue.capacity=20000
> 
> I've come upto these values depending on my R&D on the properties and the 
> issues I faced and hence the handles.
> 
> My ask here is -
> 
> 1) How can I infer, using some formula or a code, to calculate the below 
> chunk dependent on the data/code?
> 2) What are the other usable properties/configurations which I can use to 
> shorten my job runtime?
> 
> Thanks,
> Aakash.

Re: Inferring Data driven Spark parameters

Reply via email to