Done: https://issues.apache.org/jira/browse/SPARK-25837
On Thu, Oct 25, 2018 at 10:21 AM Marcelo Vanzin <van...@cloudera.com> wrote: > Ah that makes more sense. Could you file a bug with that information > so we don't lose track of this? > > Thanks > On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown > <patrick.barry.br...@gmail.com> wrote: > > > > On my production application I am running ~200 jobs at once, but > continue to submit jobs in this manner for sometimes ~1 hour. > > > > The reproduction code above generally only has 4 ish jobs running at > once, and as you can see runs through 50k jobs in this manner. > > > > I guess I should clarify my above statement, the issue seems to appear > when running multiple jobs at once as well as in sequence for a while and > may as well have something to do with high master CPU usage (thus the > collect in the code). My rough guess would be whatever is managing clearing > out completed jobs gets overwhelmed (my master was a 4 core machine while > running this, and htop reported almost full CPU usage across all 4 cores). > > > > The attached screenshot shows the state of the webui after running the > repro code, you can see the ui is displaying some 43k completed jobs (takes > a long time to load) after a few minutes of inactivity this will clear out, > however as my production application continues to submit jobs every once in > a while, the issue persists. > > > > On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin <van...@cloudera.com> > wrote: > >> > >> When you say many jobs at once, what ballpark are you talking about? > >> > >> The code in 2.3+ does try to keep data about all running jobs and > >> stages regardless of the limit. If you're running into issues because > >> of that we may have to look again at whether that's the right thing to > >> do. > >> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown > >> <patrick.barry.br...@gmail.com> wrote: > >> > > >> > I believe I may be able to reproduce this now, it seems like it may > be something to do with many jobs at once: > >> > > >> > Spark 2.3.1 > >> > > >> > > spark-shell --conf spark.ui.retainedJobs=1 > >> > > >> > scala> import scala.concurrent._ > >> > scala> import scala.concurrent.ExecutionContext.Implicits.global > >> > scala> for (i <- 0 until 50000) { Future { println(sc.parallelize(0 > until i).collect.length) } } > >> > > >> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin <van...@cloudera.com> > wrote: > >> >> > >> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a > >> >> single stage (+ the tasks related to that single stage), same thing > in > >> >> memory (checked with jvisualvm). > >> >> > >> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin <van...@cloudera.com> > wrote: > >> >> > > >> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown > >> >> > <patrick.barry.br...@gmail.com> wrote: > >> >> > > I recently upgraded to spark 2.3.1 I have had these same > settings in my spark submit script, which worked on 2.0.2, and according to > the documentation appear to not have changed: > >> >> > > > >> >> > > spark.ui.retainedTasks=1 > >> >> > > spark.ui.retainedStages=1 > >> >> > > spark.ui.retainedJobs=1 > >> >> > > >> >> > I tried that locally on the current master and it seems to be > working. > >> >> > I don't have 2.3 easily in front of me right now, but will take a > look > >> >> > Monday. > >> >> > > >> >> > -- > >> >> > Marcelo > >> >> > >> >> > >> >> > >> >> -- > >> >> Marcelo > >> > >> > >> > >> -- > >> Marcelo > > > > -- > Marcelo >