That is precisely my question- what kind of leads can I look at to get a hint of where the inefficiencies lay?
On Thu, Nov 15, 2018 at 4:56 PM David Markovitz < dudu.markov...@microsoft.com> wrote: > It seems it is almost fully utilized – when it is active. > > What happens in the gaps, where there is no spark activity? > > > > Best regards, > > > > David (דודו) Markovitz > > Technology Solutions Professional, Data Platform > > Microsoft Israel > > > > Mobile: +972-525-834-304 > > Office: +972-747-119-274 > > > > *[image: cid:image002.png@01D166A7.36DE1270]* > > > > *From:* Vitaliy Pisarev <vitaliy.pisa...@biocatch.com> > *Sent:* Thursday, November 15, 2018 4:51 PM > *To:* user <user@spark.apache.org> > *Cc:* David Markovitz <dudu.markov...@microsoft.com> > *Subject:* How to address seemingly low core utilization on a spark > workload? > > > > I have a workload that runs on a cluster of 300 cores. > > Below is a plot of the amount of active tasks over time during the > execution of this workload: > > > > [image: image.png] > > > > What I deduce is that there are substantial intervals where the cores are > heavily under-utilised. > > > > What actions can I take to: > > - Increase the efficiency (== core utilisation) of the cluster? > - Understand the root causes behind the drops in core utilisation? > >