Re: Overnight CI failures

2022-09-30 Thread Bryan Richter via ghc-devs
(Adding ghc-devs) Are these fragile tests? 1. T14346 got a "bad file descriptor" on Darwin 2. linker_unload got some gold errors on Linux Neither of these have been reported to me before, so I don't know much about them. Nor have I looked deeply (or at all) at the tests themselves, yet. On

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-29 Thread Cheng Shao
When hadrian builds the binary-dist job, invoking tar and xz is already the last step and there'll be no other ongoing jobs. But I do agree with reverting, this minor optimization I proposed has caused more trouble than its worth :/ On Thu, Sep 29, 2022 at 9:25 AM Bryan Richter wrote: > >

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-29 Thread Bryan Richter via ghc-devs
Matthew pointed out that the build system already parallelizes jobs, so it's risky to force parallelization of any individual job. That means I should just revert. On Wed, Sep 28, 2022 at 2:38 PM Cheng Shao wrote: > I believe we can either modify ci.sh to disable parallel compression > for

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao
I believe we can either modify ci.sh to disable parallel compression for i386, or modify .gitlab/gen_ci.hs and .gitlab/jobs.yaml to disable XZ_OPT=-9 for i386. On Wed, Sep 28, 2022 at 1:21 PM Bryan Richter wrote: > > Aha: while i386-linux-deb9-validate sets no extra XZ options, >

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Bryan Richter via ghc-devs
Aha: while i386-linux-deb9-validate sets no extra XZ options, *nightly*-i386-linux-deb9-validate (the failing job) sets "XZ_OPT = 9". A revert would fix the problem, but presumably so would tweaking that option. Does anyone have information that would lead to a better decision here? On Wed, Sep

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao
Sure, in which case pls revert it. Apologies for the impact, though I'm still a bit curious, the i386 job did pass in the original MR. On Wed, Sep 28, 2022 at 1:00 PM Bryan Richter wrote: > > Yep, it seems to mostly be xz that is running out of memory. (All recent > builds that I sampled, but

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Bryan Richter via ghc-devs
Yep, it seems to mostly be xz that is running out of memory. (All recent builds that I sampled, but not all builds through all time.) Thanks for pointing it out! I can revert the change. On Wed, Sep 28, 2022 at 11:46 AM Cheng Shao wrote: > Hi Bryan, > > This may be an unintended fallout of

Re: Consistent CI failure in job nightly-i386-linux-deb9-validate

2022-09-28 Thread Cheng Shao
Hi Bryan, This may be an unintended fallout of !8940. Would you try starting an i386 pipeline with it reversed to see if it solves the issue, in which case we should revert or fix it in master? On Wed, Sep 28, 2022 at 9:58 AM Bryan Richter via ghc-devs wrote: > > Hi all, > > For the past week

Re: Darwin CI Status

2021-05-20 Thread Matthew Pickering
Thanks Moritz for that update. The latest is that currently darwin CI is disabled and the merge train is unblocked (*choo choo*). I am testing Moritz's patches to speed-up CI and will merge them in shortly to get darwin coverage back. Cheers, Matt On Wed, May 19, 2021 at 9:46 AM Moritz

Re: Darwin CI Status

2021-05-19 Thread Moritz Angermann
Matt has access to the M1 builder in my closet now. The darwin performance issue is mainly there since BigSur, and (afaik) primarily due to the amount of DYLD_LIBRARY_PATH's we pass to GHC invocations. The system linker spends the majority of the time in the kernel stat'ing and getelements (or

Re: On CI

2021-03-24 Thread Andreas Klebinger
> What about the case where the rebase *lessens* the improvement? That is, you're expecting these 10 cases to improve, but after a rebase, only 1 improves. That's news! But a blanket "accept improvements" won't tell you. I don't think that scenario currently triggers a CI failure. So this

Re: On CI

2021-03-24 Thread Moritz Angermann
Yes, this is exactly one of the issues that marge might run into as well, the aggregate ends up performing differently from the individual ones. Now we have marge to ensure that at least the aggregate builds together, which is the whole point of these merge trains. Not to end up in a situation

Re: On CI

2021-03-24 Thread Richard Eisenberg
What about the case where the rebase *lessens* the improvement? That is, you're expecting these 10 cases to improve, but after a rebase, only 1 improves. That's news! But a blanket "accept improvements" won't tell you. I'm not hard against this proposal, because I know precise tracking has its

Re: On CI

2021-03-24 Thread Andreas Klebinger
After the idea of letting marge accept unexpected perf improvements and looking at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4759 which failed because of a single test, for a single build flavour crossing the improvement threshold where CI fails after rebasing I wondered. When would

RE: On CI

2021-03-18 Thread Ben Gamari
Simon Peyton Jones via ghc-devs writes: > > We need to do something about this, and I'd advocate for just not making > > stats fail with marge. > > Generally I agree. One point you don’t mention is that our perf tests > (which CI forces us to look at assiduously) are often pretty weird > cases.

Re: On CI

2021-03-18 Thread Ben Gamari
Karel Gardas writes: > On 3/17/21 4:16 PM, Andreas Klebinger wrote: >> Now that isn't really an issue anyway I think. The question is rather is >> 2% a large enough regression to worry about? 5%? 10%? > > 5-10% is still around system noise even on lightly loaded workstation. > Not sure if CI is

Re: On CI

2021-03-18 Thread John Ericson
My guess is most of the "noise" is not run time, but the compiled code changing in hard to predict ways. https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1776/diffs for example was a very small PR that took *months* of on-off work to get passing metrics tests. In the end, binding `is_boot`

Re: On CI

2021-03-18 Thread davean
I left the wiggle room for things like longer wall time causing more time events in the IO Manager/RTS which can be a thermal/HW issue. They're small and indirect though -davean On Thu, Mar 18, 2021 at 1:37 PM Sebastian Graf wrote: > To be clear: All performance tests that run as part of CI

Re: On CI

2021-03-18 Thread Sebastian Graf
To be clear: All performance tests that run as part of CI measure allocations only. No wall clock time. Those measurements are (mostly) deterministic and reproducible between compiles of the same worktree and not impacted by thermal issues/hardware at all. Am Do., 18. März 2021 um 18:09 Uhr

Re: On CI

2021-03-18 Thread davean
That really shouldn't be near system noise for a well constructed performance test. You might be seeing things like thermal issues, etc though - good benchmarking is a serious subject. Also we're not talking wall clock tests, we're talking specific metrics. The machines do tend to be bare metal,

Re: On CI

2021-03-17 Thread Karel Gardas
On 3/17/21 4:16 PM, Andreas Klebinger wrote: > Now that isn't really an issue anyway I think. The question is rather is > 2% a large enough regression to worry about? 5%? 10%? 5-10% is still around system noise even on lightly loaded workstation. Not sure if CI is not run on some shared cloud

Re: On CI

2021-03-17 Thread Merijn Verstraaten
On 17 Mar 2021, at 16:16, Andreas Klebinger wrote: > > While I fully agree with this. We should *always* want to know if a small > syntetic benchmark regresses by a lot. > Or in other words we don't want CI to accept such a regression for us ever, > but the developer of a patch should need to

Re: On CI

2021-03-17 Thread Andreas Klebinger
> I'd be quite happy to accept a 25% regression on T9872c if it yielded a 1% improvement on compiling Cabal. T9872 is very very very strange! (Maybe if *all* the T9872 tests regressed, I'd be more worried.) While I fully agree with this. We should *always* want to know if a small syntetic

Re: On CI

2021-03-17 Thread John Ericson
Yes, I think the counter point of "automating what Ben does" so people besides Ben can do it is very important. In this case, I think a good thing we could do is asynchronously build more of master post-merge, such as use the perf stats to automatically bisect anything that is fishy, including

Re: On CI

2021-03-17 Thread Sebastian Graf
Re: Performance drift: I opened https://gitlab.haskell.org/ghc/ghc/-/issues/17658 a while ago with an idea of how to measure drift a bit better. It's basically an automatically checked version of "Ben stares at performance reports every two weeks and sees that T9872 has regressed by 10% since 9.0"

Re: On CI

2021-03-17 Thread Richard Eisenberg
> On Mar 17, 2021, at 6:18 AM, Moritz Angermann > wrote: > > But what do we expect of patch authors? Right now if five people write > patches to GHC, and each of them eventually manage to get their MRs green, > after a long review, they finally see it assigned to marge, and then it >

Re: On CI

2021-03-17 Thread Moritz Angermann
I am not advocating to drop perf tests during merge requests, I just want them to not be fatal for marge batches. Yes this means that a bunch of unrelated merge requests all could be fine wrt to the perf checks per merge request, but the aggregate might fail perf. And then subsequently the next

Re: On CI

2021-03-17 Thread Spiwack, Arnaud
Ah, so it was really two identical pipelines (one for the branch where Margebot batches commits, and one for the MR that Margebot creates before merging). That's indeed a non-trivial amount of purely wasted computer-hours. Taking a step back, I am inclined to agree with the proposal of not

RE: On CI

2021-03-17 Thread Simon Peyton Jones via ghc-devs
We need to do something about this, and I'd advocate for just not making stats fail with marge. Generally I agree. One point you don’t mention is that our perf tests (which CI forces us to look at assiduously) are often pretty weird cases. So there is at least a danger that these more

Re: On CI

2021-03-17 Thread Moritz Angermann
*why* is a very good question. The MR fixing it is here: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5275 On Wed, Mar 17, 2021 at 4:26 PM Spiwack, Arnaud wrote: > Then I have a question: why are there two pipelines running on each merge > batch? > > On Wed, Mar 17, 2021 at 9:22 AM

Re: On CI

2021-03-17 Thread Spiwack, Arnaud
Then I have a question: why are there two pipelines running on each merge batch? On Wed, Mar 17, 2021 at 9:22 AM Moritz Angermann wrote: > No it wasn't. It was about the stat failures described in the next > paragraph. I could have been more clear about that. My apologies! > > On Wed, Mar 17,

Re: On CI

2021-03-17 Thread Moritz Angermann
No it wasn't. It was about the stat failures described in the next paragraph. I could have been more clear about that. My apologies! On Wed, Mar 17, 2021 at 4:14 PM Spiwack, Arnaud wrote: > > and if either of both (see below) failed, marge's merge would fail as well. >> > > Re: “see below” is

Re: On CI

2021-03-17 Thread Spiwack, Arnaud
> and if either of both (see below) failed, marge's merge would fail as well. > Re: “see below” is this referring to a missing part of your email? ___ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

Re: On CI

2021-02-22 Thread John Ericson
mailto:ghc-devs@haskell.org>> *Subject:* Re: On CI I'm not opposed to some effort going into this, but I would strongly opposite putting all our effort there. Incremental CI can cut multiple hours to < mere minutes, especially with the test suite being embarrassingly paral

Re: On CI

2021-02-22 Thread Spiwack, Arnaud
; > > > *From:* ghc-devs *On Behalf Of *John > Ericson > *Sent:* 22 February 2021 05:53 > *To:* ghc-devs > *Subject:* Re: On CI > > > > I'm not opposed to some effort going into this, but I would strongly > opposite putting all our effort there. Incremental C

RE: On CI

2021-02-22 Thread Simon Peyton Jones via ghc-devs
ink we need to do less compiling - hence incremental CI. Simon From: ghc-devs On Behalf Of John Ericson Sent: 22 February 2021 05:53 To: ghc-devs Subject: Re: On CI I'm not opposed to some effort going into this, but I would strongly opposite putting all our effort there. Incremental CI can

Re: On CI

2021-02-21 Thread John Ericson
*From:* ghc-devs mailto:ghc-devs-boun...@haskell.org>> on behalf of Simon Peyton Jones via ghc-devs mailto:ghc-devs@haskell.org>> *Sent:* Friday, February 19, 2021 8:57 AM *To:* John Ericson mailto:jo

Re: On CI

2021-02-19 Thread Richard Eisenberg
way we can > validate that the build failure was a true build failure and not just due to > the aggressive caching scheme. > > Just my 2p > > Josef > > From: ghc-devs <mailto:ghc-devs-boun...@haskell.org>> on behalf of Simon Peyton Jones via > ghc-devs ma

Re: On CI

2021-02-19 Thread Sebastian Graf
f of Simon Peyton > Jones via ghc-devs > *Sent:* Friday, February 19, 2021 8:57 AM > *To:* John Ericson ; ghc-devs < > ghc-devs@haskell.org> > *Subject:* RE: On CI > > >1. Building and testing happen together. When tests failure >spuriously, we al

Re: On CI

2021-02-19 Thread Josef Svenningsson via ghc-devs
nt: Friday, February 19, 2021 8:57 AM To: John Ericson ; ghc-devs Subject: RE: On CI 1. Building and testing happen together. When tests failure spuriously, we also have to rebuild GHC in addition to re-running the tests. That's pure waste. https://gitlab.haskell.org/ghc/ghc/-/issues

RE: On CI

2021-02-19 Thread Ben Gamari
Simon Peyton Jones via ghc-devs writes: >> 1. Building and testing happen together. When tests failure >> spuriously, we also have to rebuild GHC in addition to re-running >> the tests. That's pure waste. >> https://gitlab.haskell.org/ghc/ghc/-/issues/13897 tracks this more >> or less.

RE: On CI

2021-02-19 Thread Simon Peyton Jones via ghc-devs
03:19 To: ghc-devs Subject: Re: On CI I am also wary of us to deferring checking whole platforms and what not. I think that's just kicking the can down the road, and will result in more variance and uncertainty. It might be alright for those authoring PRs, but it will make Ben's job keeping the

Re: On CI

2021-02-18 Thread John Ericson
I am also wary of us to deferring checking whole platforms and what not. I think that's just kicking the can down the road, and will result in more variance and uncertainty. It might be alright for those authoring PRs, but it will make Ben's job keeping the system running even more grueling.

Re: On CI

2021-02-18 Thread Ben Gamari
Moritz Angermann writes: > At this point I believe we have ample Linux build capacity. Darwin looks > pretty good as well the ~4 M1s we have should in principle also be able to > build x86_64-darwin at acceptable speeds. Although on Big Sur only. > > The aarch64-Linux story is a bit constraint

Re: On CI

2021-02-18 Thread Ben Gamari
Apologies for the latency here. This thread has required a fair amount of reflection. Sebastian Graf writes: > Hi Moritz, > > I, too, had my gripes with CI turnaround times in the past. Here's a > somewhat radical proposal: > >- Run "full-build" stage builds only on Marge MRs. Then we can

Re: On CI

2021-02-18 Thread Moritz Angermann
I'm glad to report that my math was off. But it was off only because I assumed that we'd successfully build all windows configurations, which we of course don't. Thus some builds fail faster. Sylvain also provided a windows machine temporarily, until it expired. This led to a slew of new windows

Re: On CI

2021-02-17 Thread Moritz Angermann
At this point I believe we have ample Linux build capacity. Darwin looks pretty good as well the ~4 M1s we have should in principle also be able to build x86_64-darwin at acceptable speeds. Although on Big Sur only. The aarch64-Linux story is a bit constraint by powerful and fast CI machines but

Re: On CI

2021-02-17 Thread Sebastian Graf
Hi Moritz, I, too, had my gripes with CI turnaround times in the past. Here's a somewhat radical proposal: - Run "full-build" stage builds only on Marge MRs. Then we can assign to Marge much earlier, but probably have to do a bit more of (manual) bisecting of spoiled Marge batches.

Re: Marge: "CI is taking too long"

2019-01-25 Thread Ben Gamari
Richard Eisenberg writes: > Marge has complained that > https://gitlab.haskell.org/rae/ghc/-/jobs/17206 is taking too long. > And indeed it seems stuck. > Indeed currently CI is a bit backed up. There are a few reasons for this: * I am currently in the middle of a (now two-day-long) internet

Re: GitLab CI for patches across submodules

2019-01-06 Thread Simon Jakobi via ghc-devs
Am Sa., 5. Jan. 2019 um 22:18 Uhr schrieb Ben Gamari : However, we can certainly use the upstream repo during CI builds. I have opened !78 which should hopefully fix this. Perhaps you could rebase on topp of this and check? > Thanks, Ben, that works for me. What I hadn't realized before, is

Re: GitLab CI for patches across submodules

2019-01-05 Thread Ben Gamari
Simon Jakobi via ghc-devs writes: > Hi, > > I just tried to use GitLab CI to validate a GHC patch including changes to > Haddock: https://gitlab.haskell.org/sjakobi/ghc/pipelines/842 > > The problem is that the CI script tries to find my Haddock commit at >