Ah that makes more sense. Could you file a bug with that information so we don't lose track of this?
Thanks On Wed, Oct 24, 2018 at 6:13 PM Patrick Brown <patrick.barry.br...@gmail.com> wrote: > > On my production application I am running ~200 jobs at once, but continue to > submit jobs in this manner for sometimes ~1 hour. > > The reproduction code above generally only has 4 ish jobs running at once, > and as you can see runs through 50k jobs in this manner. > > I guess I should clarify my above statement, the issue seems to appear when > running multiple jobs at once as well as in sequence for a while and may as > well have something to do with high master CPU usage (thus the collect in the > code). My rough guess would be whatever is managing clearing out completed > jobs gets overwhelmed (my master was a 4 core machine while running this, and > htop reported almost full CPU usage across all 4 cores). > > The attached screenshot shows the state of the webui after running the repro > code, you can see the ui is displaying some 43k completed jobs (takes a long > time to load) after a few minutes of inactivity this will clear out, however > as my production application continues to submit jobs every once in a while, > the issue persists. > > On Wed, Oct 24, 2018 at 5:05 PM Marcelo Vanzin <van...@cloudera.com> wrote: >> >> When you say many jobs at once, what ballpark are you talking about? >> >> The code in 2.3+ does try to keep data about all running jobs and >> stages regardless of the limit. If you're running into issues because >> of that we may have to look again at whether that's the right thing to >> do. >> On Tue, Oct 23, 2018 at 10:02 AM Patrick Brown >> <patrick.barry.br...@gmail.com> wrote: >> > >> > I believe I may be able to reproduce this now, it seems like it may be >> > something to do with many jobs at once: >> > >> > Spark 2.3.1 >> > >> > > spark-shell --conf spark.ui.retainedJobs=1 >> > >> > scala> import scala.concurrent._ >> > scala> import scala.concurrent.ExecutionContext.Implicits.global >> > scala> for (i <- 0 until 50000) { Future { println(sc.parallelize(0 until >> > i).collect.length) } } >> > >> > On Mon, Oct 22, 2018 at 11:25 AM Marcelo Vanzin <van...@cloudera.com> >> > wrote: >> >> >> >> Just tried on 2.3.2 and worked fine for me. UI had a single job and a >> >> single stage (+ the tasks related to that single stage), same thing in >> >> memory (checked with jvisualvm). >> >> >> >> On Sat, Oct 20, 2018 at 6:45 PM Marcelo Vanzin <van...@cloudera.com> >> >> wrote: >> >> > >> >> > On Tue, Oct 16, 2018 at 9:34 AM Patrick Brown >> >> > <patrick.barry.br...@gmail.com> wrote: >> >> > > I recently upgraded to spark 2.3.1 I have had these same settings in >> >> > > my spark submit script, which worked on 2.0.2, and according to the >> >> > > documentation appear to not have changed: >> >> > > >> >> > > spark.ui.retainedTasks=1 >> >> > > spark.ui.retainedStages=1 >> >> > > spark.ui.retainedJobs=1 >> >> > >> >> > I tried that locally on the current master and it seems to be working. >> >> > I don't have 2.3 easily in front of me right now, but will take a look >> >> > Monday. >> >> > >> >> > -- >> >> > Marcelo >> >> >> >> >> >> >> >> -- >> >> Marcelo >> >> >> >> -- >> Marcelo -- Marcelo --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org