Don’t do this in your job. Create for different types of jobs different jobs and orchestrate them using oozie or similar.
> On 3. Jul 2018, at 09:34, Aakash Basu <aakash.spark....@gmail.com> wrote: > > Hi, > > Cluster - 5 node (1 Driver and 4 workers) > Driver Config: 16 cores, 32 GB RAM > Worker Config: 8 cores, 16 GB RAM > > I'm using the below parameters from which I know the first chunk is cluster > dependent and the second chunk is data/code dependent. > > --num-executors 4 > --executor-cores 5 > --executor-memory 10G > --driver-cores 5 > --driver-memory 25G > > > --conf spark.sql.shuffle.partitions=100 > --conf spark.driver.maxResultSize=2G > --conf "spark.executor.extraJavaOptions=-XX:+UseParallelGC" > --conf spark.scheduler.listenerbus.eventqueue.capacity=20000 > > I've come upto these values depending on my R&D on the properties and the > issues I faced and hence the handles. > > My ask here is - > > 1) How can I infer, using some formula or a code, to calculate the below > chunk dependent on the data/code? > 2) What are the other usable properties/configurations which I can use to > shorten my job runtime? > > Thanks, > Aakash.