new branch-3.0 jenkins job configs are ready to be deployed...

2020-01-31 Thread shane knapp
...whenever i get the word. :) FWIW they will all be identical to the current group of master builds/tests. shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu

Re: Apache Spark Docker image repository

2020-02-05 Thread shane knapp
> > (This can be used in GitHub Action Jobs and Jenkins K8s > Integration Tests to speed up jobs and to have more stabler environments) > yep! not only that, if we ever get around (hopefully this year) to containerizing (the majority) the master and branch builds, i think it'd be nice to

[build system] enabled the ubuntu staging node to help w/build queue

2020-02-11 Thread shane knapp
the build queue has been increasing and to help throughput i enabled the 'ubuntu-testing' node. i spot-checked a bunch of the spark maven builds, and they passed. i'll keep an eye out for any failures caused by the system and either remove it from the worker pool of fix what i need to. shane --

Re: 'spark-master-docs' job missing in Jenkins

2020-02-25 Thread shane knapp
it's been gone for quite a long time. these docs were being built but not published. relevant discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-moving-the-spark-jenkins-job-builder-repo-from-dbricks-spark-tp25325p26222.html shane On Tue, Feb 25, 2020 at 6:18 PM Hyukjin Kw

Re: 'spark-master-docs' job missing in Jenkins

2020-02-26 Thread shane knapp
on documentation ( >>> https://spark.apache.org/docs/latest/api/sql/index.html) are >>> not being tested anymore if I am not mistaken. I believe >>> spark-master-docs >>> <https://amplab.cs.berkeley.edu/jenkins/job/spark-master-docs> was only >>> the job

Re: Auto-linking from PRs to Jira tickets

2020-03-11 Thread shane knapp
oh this is badass... i really like it! On Tue, Mar 10, 2020 at 12:03 PM Alex Ott wrote: > yes - it's https://issues.apache.org/jira/browse/INFRA-19934 > > Nicholas Chammas at "Tue, 10 Mar 2020 13:52:23 -0400" wrote: > NC> Could you point us to the ticket? I'd like to follow along. > > NC> On

[build system] jenkins rebooting now

2020-05-14 Thread shane knapp
that is all. -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu

Re: [build system] jenkins rebooting now

2020-05-14 Thread shane knapp
we're back. doesn't seem to have fixed the issue of the workers connecting to repository.apache.org but i'm still investigating. On Thu, May 14, 2020 at 9:11 AM shane knapp ☠ wrote: > that is all. > > -- > Shane Knapp > Computer Guy / Voice of Reason > UC Berke

Re: ./dev/run-tests failing at master

2020-05-14 Thread shane knapp
this is the flake8 versioning from a jenkins worker: $ flake8 --version 3.6.0 (mccabe: 0.6.1, pycodestyle: 2.4.0, pyflakes: 2.0.0) CPython 3.6.8 on Linux be sure you've got all the right versions of packages in there. On Thu, May 14, 2020 at 12:19 PM suddhu wrote: > Thanks for the response Jeff

Re: Build time limit in PR builder

2020-05-28 Thread shane knapp
On Thu, May 28, 2020 at 7:16 AM Sean Owen wrote: > What else can we do, I suppose? > > there,s not much else we can do. i'll add 30m to the timeout. shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu

Re: Build time limit in PR builder

2020-05-28 Thread shane knapp
sts-jenkins.py#L201 > :-). > > 2020년 5월 29일 (금) 오전 12:14, shane knapp ☠ 님이 작성: > >> On Thu, May 28, 2020 at 7:16 AM Sean Owen wrote: >> >>> What else can we do, I suppose? >>> >>> there,s not much else we can do. i'll add 30m to the ti

Re: Build time limit in PR builder

2020-05-28 Thread shane knapp
https://github.com/apache/spark/pull/28666 On Thu, May 28, 2020 at 11:20 AM shane knapp ☠ wrote: > i'll get a PR put together now. > > On Thu, May 28, 2020 at 8:26 AM Hyukjin Kwon wrote: > >> I remember we were able to cut down pretty considerably in the past. For >

Re: Build time limit in PR builder

2020-05-28 Thread shane knapp
the timer is set to 500m now in master, 3.0 and 2.4. On Thu, May 28, 2020 at 12:32 PM Kousuke Saruta wrote: > Thanks all. It's very helpful! > > - Kousuke > > On 2020/05/29 3:31, shane knapp ☠ wrote: > > https://github.com/apache/spark/pull/28666 > > On Thu, May

Re: m2 cache issues in Jenkins?

2020-06-24 Thread shane knapp
for those weird failures, it's super helpful to provide which workers are showing these issues. :) i'd rather not wipe all of the m2 caches on all of the workers, as we'll then potentially get blacklisted again if we download too many packages from apache.org. On Tue, Jun 23, 2020 at 5:58 PM Hol

Re: m2 cache issues in Jenkins?

2020-06-24 Thread shane knapp
en Karau wrote: > The most recent one I noticed was > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124437/console > which > was run on amp-jenkins-worker-04. > > On Wed, Jun 24, 2020 at 10:44 AM shane knapp ☠ > wrote: > >> for those weird failures

Re: m2 cache issues in Jenkins?

2020-06-24 Thread shane knapp
done: -bash-4.1$ cd .m2 -bash-4.1$ ls repository -bash-4.1$ time rm -rf * real17m4.607s user0m0.950s sys 0m18.816s -bash-4.1$ On Wed, Jun 24, 2020 at 10:50 AM shane knapp ☠ wrote: > ok, i've taken that worker offline and once the job running on it > finishes, i'l

Re: Jenkins is down

2020-07-05 Thread shane knapp
hey all, i was out of town for the weekend and noticed it was down this morning and restarted the service. it's been pretty flaky recently, so i'll take a much closer look at things this coming week. On Sun, Jul 5, 2020 at 1:14 PM Dongjoon Hyun wrote: > Hi, All. > > Now, AmpLab Jenkins farm cam

Re: m2 cache issues in Jenkins?

2020-07-06 Thread shane knapp
t;>> >>>>>> Huh interesting that it’s the same worker. Have you filed a ticket to >>>>>> Shane? >>>>>> >>>>>> On Wed, Jul 1, 2020 at 8:50 PM Hyukjin Kwon >>>>>> wrote: >>>>>> >

Re: m2 cache issues in Jenkins?

2020-07-06 Thread shane knapp
i killed and retriggered the PRB jobs on 04, and wiped that workers' m2 cache. On Mon, Jul 6, 2020 at 9:24 AM shane knapp ☠ wrote: > once the jobs running on that worker are finished, yes. > > On Sun, Jul 5, 2020 at 7:41 PM Hyukjin Kwon wrote: > >> Shane, can we remove

Re: m2 cache issues in Jenkins?

2020-07-06 Thread shane knapp
k Lim > wrote: > >> Could this be a flaky or persistent issue? It failed with Scala gendoc >> but it didn't fail with the part the PR modified. It ran from worker-05. >> >> >> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125121/console

restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-07 Thread shane knapp
i wasn't able to get to it today, so i'm hoping to squeeze in a quick trip to the colo tomorrow morning. if not, then first thing thursday. -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-08 Thread shane knapp
this will be happening tomorrow... today is Meeting Hell Day[tm]. On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠ wrote: > i wasn't able to get to it today, so i'm hoping to squeeze in a quick trip > to the colo tomorrow morning. if not, then first thing thursday. > > -- &g

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp
this is happening now. On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ wrote: > this will be happening tomorrow... today is Meeting Hell Day[tm]. > > On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠ wrote: > >> i wasn't able to get to it today, so i'm hoping to squeeze in

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp
ok, we're back up and building (just waiting for one worker, -06 to finish cleaning itself up). On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠ wrote: > this is happening now. > > On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ wrote: > >> this will be happening tomorrow...

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp
and -06 is back! i'll keep an eye on things today, but suffice to say on each worker i: 1) rebooted 2) cleaned ~/.ivy2, ~/.m2, and other associated caches we should be g2g! please reply here if you continue to see weirdness. On Thu, Jul 9, 2020 at 10:08 AM shane knapp ☠ wrote:

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp
i'm seeing green PRB builds now, so i feel that we've gotten things building again! :) On Thu, Jul 9, 2020 at 5:33 PM Hyukjin Kwon wrote: > Thank you Shane. > > 2020년 7월 10일 (금) 오전 2:35, shane knapp ☠ 님이 작성: > >> and -06 is back! i'll keep an eye on thing

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp
yeah, i can't do much for flaky tests... just flaky infrastructure. On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon wrote: > Couple of flaky tests can happen. It's usual. Seems it got better now at > least. I will keep monitoring the builds. > > 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성: > >> Loo

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp
9, assuming that > infrastructure? > > On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ wrote: > >> yeah, i can't do much for flaky tests... just flaky infrastructure. >> >> >> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon >> wrote: >> >>

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp
arkPullRequestBuilder/125563/console > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console > > On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ wrote: > >>

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp
er load. Thanks. >> >> On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ >> wrote: >> >>> only 125561, 125562 and 125564 were impacted by -9. >>> >>> 125565 exited w/a code of 15 (143 - 128), which means the process was >>> terminated for u

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-13 Thread shane knapp
ings but i feel we're out of the woods right now. :) shane On Fri, Jul 10, 2020 at 3:43 PM Frank Yin wrote: > Great. Thanks. > > On Fri, Jul 10, 2020 at 3:39 PM shane knapp ☠ wrote: > >> no, 8 hours is plenty. things will speed up soon once the backlog of >>

Re: Welcoming some new Apache Spark committers

2020-07-14 Thread shane knapp
welcome, all! On Tue, Jul 14, 2020 at 10:37 AM Matei Zaharia wrote: > Hi all, > > The Spark PMC recently voted to add several new committers. Please join me > in welcoming them to their new roles! The new committers are: > > - Huaxin Gao > - Jungtaek Lim > - Dilip Biswal > > All three of them co

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-14 Thread shane knapp
this is seriously great news! let's all take a moment and welcome apache spark's python support to the present. ;) On Mon, Jul 13, 2020 at 7:26 PM Holden Karau wrote: > Awesome, thanks you for driving this forward :) > > On Mon, Jul 13, 2020 at 7:25 PM Hyukjin Kwon wrote: > >> Thank you all.

R installation broken on ubuntu workers, impacts K8s PRB builds

2020-07-15 Thread shane knapp
i'm not entirely sure when the dep for R got bumped to 3.5+, but it's breaking the k8s builds. i'll need to purge these workers of all previous versions of R + packages, then reinstall from scratch. this isn't a horrible task as i have most of it automated but it will still require a ~few hours o

Re: R installation broken on ubuntu workers, impacts K8s PRB builds

2020-07-16 Thread shane knapp
SPARK-32326 > > On Wed, Jul 15, 2020 at 12:09 PM shane knapp ☠ > wrote: > >> i'm not entirely sure when the dep for R got bumped to 3.5+, but it's >> breaking the k8s builds. >> >> i'll need to purge these workers of all previous versions of R + &g

Re: R installation broken on ubuntu workers, impacts K8s PRB builds

2020-07-17 Thread shane knapp
starting now... pausing jenkins so no new builds are launched. On Thu, Jul 16, 2020 at 3:09 PM Holden Karau wrote: > Sounds good, thanks. No rush :) > > On Thu, Jul 16, 2020 at 3:03 PM shane knapp ☠ wrote: > >> i'll get to this tomorrow afternoon, and there will be a

Re: R installation broken on ubuntu workers, impacts K8s PRB builds

2020-07-17 Thread shane knapp
this is done, except for amp-jenkins-staging-worker-02 which is refusing to allow me to reinstall R... i marked that worker offline and will beat on it later today. On Fri, Jul 17, 2020 at 11:36 AM shane knapp ☠ wrote: > starting now... pausing jenkins so no new builds are launched. >

[build system] restarting jenkins now

2020-08-14 Thread shane knapp
there isn't much activity right now, and i'd like to restart jenkins quickly as it's consuming a lot of memory on the head node. shouldn't be more than a couple of minutes downtime... if something goes awry i'll send an email here. if you don't hear from me again, please carry on. :) -- Shane

Re: Running K8s integration tests for changes in core?

2020-08-18 Thread shane knapp
yes, i think this is fine. the k8s prb runs concurrently to the regular prb and takes ~20m. On Tue, Aug 18, 2020 at 8:47 PM Holden Karau wrote: > Hi Dev Folks, > > I was wondering how people feel about enabling the K8s PRB automatically > for all core changes? Sometimes I forget that a change m

Re: Running K8s integration tests for changes in core?

2020-08-19 Thread shane knapp
we'll be gated by the number of ubuntu workers w/minikube and docker, but it shouldn't be too bad as the full integration test takes ~45m, vs 4+ hrs for the regular PRB. i can enable this in about 1m of time if the consensus is for us to want this. On Wed, Aug 19, 2020 at 11:37 AM Holden Karau w

Re: Running K8s integration tests for changes in core?

2020-08-20 Thread shane knapp
ust said). > > A presubmit(which includes K8s integration tests) build will be run, once > the PR receives LGTM from "Approved reviewers". This is one criteria that > comes to my mind, others may have better suggestions. > > On Thu, Aug 20, 2020 at 12:25 AM shane knapp ☠

[build system] shane out all next week (aug 22-29), support instructions

2020-08-20 Thread shane knapp
i will be disappearing off in to the wilderness for a few days of backpacking, and am handing off basic support duties to my team. if, and only if, jenkins goes down, please email research-supp...@cs.berkeley.edu and open a ticket. if you open a ticket, please let dev@ know to minimize the number

[build system] downtime due to SSL cert errors

2020-09-23 Thread shane knapp
jenkins is up and building, but not reachable via https at the moment. i'm working on getting this sorted ASAP. shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu

Re: [build system] downtime due to SSL cert errors

2020-09-24 Thread shane knapp
certs delivered and installed... we're back! On Wed, Sep 23, 2020 at 6:07 PM shane knapp ☠ wrote: > jenkins is up and building, but not reachable via https at the moment. > i'm working on getting this sorted ASAP. > > shane > -- > Shane Knapp > Computer Guy / Voi

Re: Running K8s integration tests for changes in core?

2020-09-24 Thread shane knapp
> Sounds good, thanks for the heads up. I hope you get some time to relax :) > > On Thu, Aug 20, 2020 at 2:26 PM shane knapp ☠ wrote: > >> fyi, i won't be making this change until the 1st week of september. i'll >> be out, off the grid all next week! :) >>

[build system] jenkins wedged again

2020-10-14 Thread shane knapp
i'm going to reboot the primary and worker nodes, so it'll be a few minutes before everything is back up. shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu

Re: [build system] jenkins wedged again

2020-10-14 Thread shane knapp
we're mostly back up, and just waiting for a couple of ubuntu boxes to finish booting... prb seem to be building now! On Wed, Oct 14, 2020 at 11:48 AM shane knapp ☠ wrote: > i'm going to reboot the primary and worker nodes, so it'll be a few > minutes before everything

Re: [build system] jenkins wedged again

2020-10-14 Thread shane knapp
everything's up and jenkins is slowly chewing through the queue! :) On Wed, Oct 14, 2020 at 12:00 PM Xiao Li wrote: > Thank you, Shane! > > Xiao > > On Wed, Oct 14, 2020 at 12:00 PM shane knapp ☠ > wrote: > >> we're mostly back up, and just waiting for

[build system] IMPORTANT: builds will be impacted this month

2020-11-02 Thread shane knapp
TL;DR: our build system is ancient, EOLed and about to get hit hard w/a secops hammer. we need to literally reinstall the entire cluster from scratch and get things working. here are the high level bullet points about what's coming up in the next month: ** all amp-jenkins-worker-* nodes are run

jenkins downtime tomorrow evening/weekend

2020-11-19 Thread shane knapp
i'm going to be upgrading jenkins to something more reasonable, and there will definitely be some downtime as i get things sorted. we should be back up and building by monday. shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://ris

Re: jenkins downtime tomorrow evening/weekend

2020-11-21 Thread shane knapp
this is starting now On Thu, Nov 19, 2020 at 4:34 PM shane knapp ☠ wrote: > i'm going to be upgrading jenkins to something more reasonable, and there > will definitely be some downtime as i get things sorted. > > we should be back up and building by monday. > > s

Re: jenkins downtime tomorrow evening/weekend

2020-11-21 Thread shane knapp
somehow that went pretty smoothly, tho i've got a bunch of plugins to deal with... we're back up and building w/a shiny new UI. :) On Sat, Nov 21, 2020 at 3:52 PM shane knapp ☠ wrote: > this is starting now > > On Thu, Nov 19, 2020 at 4:34 PM shane knapp ☠ wrote: &

Re: jenkins downtime tomorrow evening/weekend

2020-11-23 Thread shane knapp
ubuntu 16, 18 or 20. shane On Sat, Nov 21, 2020 at 4:23 PM shane knapp ☠ wrote: > somehow that went pretty smoothly, tho i've got a bunch of plugins to deal > with... we're back up and building w/a shiny new UI. :) > > On Sat, Nov 21, 2020 at 3:52 PM shane knapp ☠ wrote: &

Re: jenkins downtime tomorrow evening/weekend

2020-11-23 Thread shane knapp
a and ping me here. also, my backlog of things i need to install will be addressed this week. the ansible is coming along nicely! On Mon, Nov 23, 2020 at 2:11 PM shane knapp ☠ wrote: > the third most terrifying event in the world, a massive jenkins plugin > update is happening in a coupl

[build system] IMPORTANT UPDATE

2020-11-24 Thread shane knapp
this is a lengthy, but important read for everyone here. in the next few days, the remaining centos machines (PRB/SBT workers AND primary) will have be reimaged from centos6.9 to ubuntu 20.04LTS. this means three important things on the very near horizon: 1 -- the PRB and SBT tests WILL BE BROKEN

Re: jenkins downtime tomorrow evening/weekend

2020-11-24 Thread shane knapp
> > Please see https://issues.apache.org/jira/browse/SPARK-27177 for more > details. > > On Tue, Nov 24, 2020 at 8:23 AM shane knapp ☠ wrote: > >> it seems that the plugin upgrade went as smoothly as it could have... i >> still have a bunch of stack traces to filter th

Re: [build system] IMPORTANT UPDATE

2020-11-24 Thread shane knapp
nk you jon! shane On Tue, Nov 24, 2020 at 11:24 AM shane knapp ☠ wrote: > this is a lengthy, but important read for everyone here. > > in the next few days, the remaining centos machines (PRB/SBT workers AND > primary) will have be reimaged from centos6.9 to ubuntu 20.04LTS. > >

Re: [build system] IMPORTANT UPDATE

2020-11-24 Thread shane knapp
our very first ubuntu-based PRB is running: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131701/ crossing my fingers! :) On Tue, Nov 24, 2020 at 1:30 PM shane knapp ☠ wrote: > due to scheduling, upcoming holiday and in-the-colo work requirements, all > of the

Re: [build system] IMPORTANT UPDATE

2020-11-24 Thread shane knapp
E thanks goes out to jon for the work going on at the colo this moment: rack rearrangement, cleaning up networking, fixing hardware, reimaging and generally kicking ass! have a great holiday! shane On Tue, Nov 24, 2020 at 2:24 PM shane knapp ☠ wrote: > our very first ubuntu-based PRB is runn

Re: [build system] IMPORTANT UPDATE

2020-11-25 Thread shane knapp
On Tue, Nov 24, 2020 at 6:08 PM shane knapp ☠ wrote: > all spark builds have been ported and triggered: > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ > > not shown are the regular and k8s PRB, which are also running. > > i think i've nai

Re: [build system] IMPORTANT UPDATE

2020-11-25 Thread shane knapp
ly up and running. shane On Wed, Nov 25, 2020 at 1:35 PM shane knapp ☠ wrote: > hey all, work is going quite well and smoothly for this project. > > today's update: > > we will experience significant downtime monday/tuesday as we spin up the > new primary jenkins node.

[build system] jenkins downtime today/tomorrow

2020-11-30 Thread shane knapp
hey all! the Great Jenkins Migration[tm] is well under way, and we will be sunsetting the old amp-jenkins-master server and moving to a new one. i've put jenkins in to quiet mode so that it won't accept new builds and we'll let the ones currently running finish. once that's done, i will be rysnc

Re: [build system] jenkins downtime today/tomorrow

2020-11-30 Thread shane knapp
old jenkins is getting shut down Real Soon Now[tm]! crossing my fingers! :) On Mon, Nov 30, 2020 at 10:05 AM shane knapp ☠ wrote: > hey all! > > the Great Jenkins Migration[tm] is well under way, and we will be > sunsetting the old amp-jenkins-master server and moving to a new o

Re: [build system] jenkins downtime today/tomorrow

2020-11-30 Thread shane knapp
amplab jenkins is down. On Mon, Nov 30, 2020 at 3:25 PM shane knapp ☠ wrote: > old jenkins is getting shut down Real Soon Now[tm]! crossing my fingers! > :) > > On Mon, Nov 30, 2020 at 10:05 AM shane knapp ☠ > wrote: > >> hey all! >> >> the Great Jenkins

Re: [build system] jenkins downtime today/tomorrow

2020-12-01 Thread shane knapp
l start building and move on to fixing any lingering environment/system issues that pop up. shane On Mon, Nov 30, 2020 at 4:01 PM shane knapp ☠ wrote: > amplab jenkins is down. > > On Mon, Nov 30, 2020 at 3:25 PM shane knapp ☠ wrote: > >> old jenkins is getting shut down Real

[build system] WE'RE LIVE!

2020-12-01 Thread shane knapp
https://amplab.cs.berkeley.edu/jenkins/ i cleared the build queue, so you'll need to retrigger your PRs. there will be occasional downtime over the next few days and weeks as we uncover system-level errors and more reimaging happens... but for now, we're building. a big thanks goes out to jon f

Re: [build system] WE'RE LIVE!

2020-12-04 Thread shane knapp
the last one Dec 2nd failed: > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/3186/ > > Not sure if this is result of upgrade? > > Thanks, > Tom > On Tuesday, December 1, 2020, 06:55:27 PM CST, shane knapp ☠ < > skn...@berkeley

Re: [build system] WE'RE LIVE!

2020-12-04 Thread shane knapp
ok, it's broken on the new nodes, so i tied the project to ubuntu16. i'll create a jira and investigate further at a later date. On Fri, Dec 4, 2020 at 8:58 AM shane knapp ☠ wrote: > no, it isn't but i'll try and take a look at this later today. > > On Fri, Dec

[build system] jenkins downtime 01/02/2021 - 01/03/2020

2020-12-21 Thread shane knapp
the colo facility where jenkins is hosted is going down for roughly a day for some (more) power upgrades. once the colo is powered back up, we'll make sure that all the jenkins workers and primary nodes are up and happily building. if anyone notices any issues w/jenkins before, during or after th

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread shane knapp
> > 1. Jenkins machines start to fail with the following recently. > (master branch) > > Python versions prior to 3.6 are not supported. > Build step 'Execute shell' marked build as failure > > examples please? -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research /

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread shane knapp
rk%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/1836/console > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/1887/console > > On Fri, Jan 8, 2021 at 2:13 PM shane knapp ☠ wrote: > >> 1. Jenkins mac

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread shane knapp
erkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/1887/console >> >> On Fri, Jan 8, 2021 at 2:13 PM shane knapp ☠ wrote: >> >>> 1. Jenkins machines start to fail with the following recently. >>>> (master bran

Re: K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread shane knapp
the AmplabJenks bot's github creds are out of date, which is causing that non-fatal error. however, if you scroll back you'll see that minikube actually failed to start. that should have definitely failed the build, so i'll look at the job's bash logic and see what we missed. also, that worker (

Re: K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread shane knapp
stupid bash variable assignment. i'm surprised this has lingered for as long as it had (3 years). it's fixed and shouldn't be an issue any more. On Tue, Feb 23, 2021 at 9:28 AM shane knapp ☠ wrote: > the AmplabJenks bot's github creds are out of date, which is causing

[build system] jenkins wedged, going to restart after current builds finish

2021-02-23 Thread shane knapp
EOM -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu

Re: [build system] jenkins wedged, going to restart after current builds finish

2021-02-23 Thread shane knapp
this was done about an hour ago... rebooted several of the workers to clear out lingering builds, and one worker had an SSD fail on boot and is currently offline. shane On Tue, Feb 23, 2021 at 10:13 AM shane knapp ☠ wrote: > EOM > > -- > Shane Knapp > Computer Guy / Voice

Re: minikube and kubernetes cluster versions for integration testing

2021-03-03 Thread shane knapp
please open a jira for this and assign it to me... shouldn't be too big of a deal to get this set up. On Tue, Mar 2, 2021 at 6:06 PM Dongjoon Hyun wrote: > Thank you for sharing and suggestion, Attila. > > Additionally, given the following information, > > - The latest Minikube is v1.18.0 with

Re: minikube and kubernetes cluster versions for integration testing

2021-03-04 Thread shane knapp
fwiw, upgrading minikube and the associated VM drivers is potentially a PITA. your PR will absolutely be tested before merging. :) On Thu, Mar 4, 2021 at 10:13 AM attilapiros wrote: > Thanks Shane! > > I can do the documentation task and the Minikube version check can be > incorporated into my

[build system] github fetches timing out

2021-03-09 Thread shane knapp
it looks like over the past few days the master/branch builds have been timing out... this hasn't happened in a few years, and honestly the last times this happened there was nothing that either i, or github could do about it. it cleared up after a number of weeks, and we were never able to pinpo

Re: [build system] github fetches timing out

2021-03-10 Thread shane knapp
...and just like that, overnight the builds started successfully git fetching! On Tue, Mar 9, 2021 at 12:31 PM shane knapp ☠ wrote: > it looks like over the past few days the master/branch builds have been > timing out... this hasn't happened in a few years, and honestly the last &

Re: [build system] github fetches timing out

2021-03-17 Thread shane knapp
it's been happening a lot again recently... i'm investigating. On Wed, Mar 10, 2021 at 10:23 AM Liang-Chi Hsieh wrote: > Thanks Shane for looking at it! > > > shane knapp ☠ wrote > > ...and just like that, overnight the builds started successfully git > > fe

[build system] short downtime today, new workers coming soon

2021-03-23 Thread shane knapp
jenkins is acting up, and i'm going to take the opportunity to reboot the primary and all the workers. sorry for the short notice, but on the bright side we have a bunch of shiny new workers coming soon! shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staf

Re: [build system] short downtime today, new workers coming soon

2021-03-23 Thread shane knapp
we're back! On Tue, Mar 23, 2021 at 12:31 PM shane knapp ☠ wrote: > jenkins is acting up, and i'm going to take the opportunity to reboot the > primary and all the workers. > > sorry for the short notice, but on the bright side we have a bunch of > shiny new worke

please read: current state and the future of the apache spark build system

2021-04-07 Thread shane knapp
this will be a relatively big update, as there are many many moving pieces with short, medium and long term goals. TLDR1: we're shutting jenkins down at the end of 2021. TLDR2: i know we're way behind on pretty much everything. most of the hardware is at or beyond EOL, and random systemic bui

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-08 Thread shane knapp
On Wed, Apr 7, 2021 at 6:30 AM Hyukjin Kwon wrote: > Thanks Martin for your feedback. > > > What was your reason to migrate from Apache Jenkins to Github Actions ? > > I am sure there were more reasons for migrating from Amplap Jenkins > to GitHub Actions

[SPARK-34738] issues w/k8s+minikube and PV tests

2021-04-14 Thread shane knapp
please see: https://issues.apache.org/jira/browse/SPARK-34738 i could really use a hand. all k8s integration tests are currently broken, and i'd rather spend the time fixing the latest version of minikube, k8s and the docker virtualization layer than debug the 'old' way which uses the kvm2/qemu

Re: [SPARK-34738] issues w/k8s+minikube and PV tests

2021-04-14 Thread shane knapp
On Wed, Apr 14, 2021 at 10:32 AM Frank Luo wrote: > Is there any hard dependency on minkube? (i.e, GPU setting), kind ( > https://kind.sigs.k8s.io/) is a stabler and simpler k8s cluster env on a > single machine (only requires docker) , it been widely used by k8s projects > testing. > > there are

Re: please read: current state and the future of the apache spark build system

2021-04-14 Thread shane knapp
> > medium term (in 6 months): > * prepare jenkins worker ansible configs and stick in the spark repo > - nothing fancy, but enough to config ubuntu workers > - could be used to create docker containers for testing in > THE CLOUD > > fwiw, i just decided to bang this out today: https://github.c

Re: [SPARK-34738] issues w/k8s+minikube and PV tests

2021-04-15 Thread shane knapp
instead and that one > test fails because it relies on some minikube specific functionality. That > test could be refactored because I think it’s just adding a minimal Ceph > cluster to the K8S cluster which can be done to any K8S cluster in principal > > > > > > > >

Re: [SPARK-34738] issues w/k8s+minikube and PV tests

2021-04-16 Thread shane knapp
aded early next week. On Thu, Apr 15, 2021 at 3:05 PM shane knapp ☠ wrote: > i'm all for that... and once they're turned off, we can finish the > minikube/k8s/move-to-docker project in a couple of hours max. > > On Thu, Apr 15, 2021 at 3:00 PM Holden Karau wrote: > >&

[build system] jenkins down, working on it

2021-05-04 Thread shane knapp
jenkins went down some time in the past few days, and i'm currently investigating. if it's been down a while, i apologize as i've been dealing w/some health issues. shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkel

Re: [build system] jenkins down, working on it

2021-05-04 Thread shane knapp
we're back and building! On Tue, May 4, 2021 at 4:03 PM shane knapp ☠ wrote: > jenkins went down some time in the past few days, and i'm currently > investigating. > > if it's been down a while, i apologize as i've been dealing w/some health > issues. > >

Re: How to think about SparkPullRequestBuilder-K8s?

2021-06-11 Thread shane knapp
btw i just noticed jenkins was down, and i restarted the primary node. On Fri, Jun 11, 2021 at 12:09 PM Sean Owen wrote: > I find that somewhat often, the K8S PR builders will fail on a PR: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/ > > ... when the PR seems totall

Re: How to think about SparkPullRequestBuilder-K8s?

2021-06-11 Thread shane knapp
we're back. On Fri, Jun 11, 2021 at 2:30 PM shane knapp ☠ wrote: > btw i just noticed jenkins was down, and i restarted the primary node. > > On Fri, Jun 11, 2021 at 12:09 PM Sean Owen wrote: > >> I find that somewhat often, the K8S PR builders will

quick jenkins restart

2021-07-09 Thread shane knapp
the primary is running out of memory pretty quickly, and i'm going to reboot the server quickly so that it doesn't crash over the weekend. we'll investigate a bit more next week. shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://

Re: quick jenkins restart

2021-07-09 Thread shane knapp
we're back up! On Fri, Jul 9, 2021 at 10:23 AM shane knapp ☠ wrote: > the primary is running out of memory pretty quickly, and i'm going to > reboot the server quickly so that it doesn't crash over the weekend. > > we'll investigate a bit more next week. > &g

[build system] jenkins downtime today

2021-07-22 Thread shane knapp
i'll be taking jenkins down for a couple of hours today to reboot/clean up the workers and finish up the python package installs covered in https://github.com/apache/spark/pull/33469/files shane -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead

Re: [build system] jenkins downtime today

2021-07-22 Thread shane knapp
that actually went much faster than anticipated, and we're already back up and building! On Thu, Jul 22, 2021 at 10:24 AM shane knapp ☠ wrote: > i'll be taking jenkins down for a couple of hours today to reboot/clean up > the workers and finish up the python package installs co

Re: please read: current state and the future of the apache spark build system

2021-07-28 Thread shane knapp
3 months later, i have some updates! TLDR1: we're shutting jenkins down at the end of 2021. > > this is still the goal, exact shutdown date TBD. > long term (until EOY): > * decide what the future of spark builds and releases will look like > - do we need jenkins? > - if we do, who's respo

[build system] jenkins "freeze" for remainder of 2021

2021-07-28 Thread shane knapp
since we're sunsetting jenkins by the end of 2021, i'd like to institute a general freeze on package/feature requests. this includes, but is not limited to things like python packages, new versions of python, and pretty much anything that requires changes to the bare-metal systems that run jenkins

  1   2   >