[build system] jenkins wedged again, rebooting master node

2019-03-15 Thread shane knapp
as some of you may have noticed, jenkins got itself in a bad state multiple times over the past couple of weeks. usually restarting the service is sufficient, but it appears that i need to hit it w/the reboot hammer. jenkins will be down for the next 20-30 minutes as the node reboots and jenkins

Re: [build system] jenkins wedged again, rebooting master node

2019-03-15 Thread shane knapp
well, that box rebooted in record time! we're back up and building. and as always, i'll keep a close eye on things today... jenkins usually works great, until it doesn't. :\ On Fri, Mar 15, 2019 at 9:52 AM shane knapp wrote: > as some of you may have noticed, jenkins got itself in a bad stat

Re: [build system] jenkins wedged again, rebooting master node

2019-03-15 Thread Wenchen Fan
cool, thanks! On Sat, Mar 16, 2019 at 1:08 AM shane knapp wrote: > well, that box rebooted in record time! we're back up and building. > > and as always, i'll keep a close eye on things today... jenkins usually > works great, until it doesn't. :\ > > On Fri, Mar 15, 2019 at 9:52 AM shane knap

Re: [build system] jenkins wedged again, rebooting master node

2019-03-15 Thread Sean Owen
It's not responding again. Is there any way to kick it harder? I know it's well understood but this means not much can be merged in Spark On Fri, Mar 15, 2019 at 12:08 PM shane knapp wrote: > > well, that box rebooted in record time! we're back up and building. > > and as always, i'll keep a clo

Re: [build system] jenkins wedged again, rebooting master node

2019-03-16 Thread shane knapp
argh. kicking it again. On Fri, Mar 15, 2019 at 8:22 PM Sean Owen wrote: > It's not responding again. Is there any way to kick it harder? I know > it's well understood but this means not much can be merged in Spark > > On Fri, Mar 15, 2019 at 12:08 PM shane knapp wrote: > > > > well, that box

Re: [build system] jenkins wedged again, rebooting master node

2019-03-16 Thread shane knapp
On Fri, Mar 15, 2019 at 8:22 PM Sean Owen wrote: > It's not responding again. Is there any way to kick it harder? I know > it's well understood but this means not much can be merged in Spark > > it's back up and running now. btw, the only way to kick it harder would be to do a complete reinstall

Re: [build system] jenkins wedged again, rebooting master node

2019-03-17 Thread Sean Owen
It's wedged against since this morning. Something's clearly gone wrong-er than usual; any recent changes that could be a culprit? On Fri, Mar 15, 2019 at 12:08 PM shane knapp wrote: > > well, that box rebooted in record time! we're back up and building. > > and as always, i'll keep a close eye o

Re: [build system] jenkins wedged again, rebooting master node

2019-03-17 Thread shane knapp
i kicked the service. again. and jenkins seems happy for now. nothing has changed (system config, packages, etc) on the master node. i'll dive in to this tomorrow morning. On Sun, Mar 17, 2019 at 9:46 AM Sean Owen wrote: > It's wedged against since this morning. Something's clearly gone > wro

Re: [build system] jenkins wedged again, rebooting master node

2019-03-18 Thread shane knapp
ok, i dug through the logs and noticed that rsyslogd was dropping messages to do imuxsock being spammed by postfix... which i then tracked down to our installation of fail2ban being incorrectly configured and attempting to send IP ban/unban status emails to 'em...@example.com'. since we're a univ

Re: [build system] jenkins wedged again, rebooting master node

2019-03-19 Thread Imran Rashid
seems wedged again? sorry for the bad news Shane, thanks for all the work on fixing it On Mon, Mar 18, 2019 at 4:02 PM shane knapp wrote: > ok, i dug through the logs and noticed that rsyslogd was dropping messages > to do imuxsock being spammed by postfix... which i then tracked down to > our

Re: [build system] jenkins wedged again, rebooting master node

2019-03-21 Thread shane knapp
i tweaked some apache settings (MaxClients increased to fix an error i found buried in the logs, and added 'retry' and 'acquire' to the reverse proxy settings to hopefully combat the dreaded 502 response), restarted httpd and things actually seem quite snappy right now! i'm not holding my breath,

Re: [build system] jenkins wedged again, rebooting master node

2019-03-22 Thread shane knapp
i was right to not hold my breath... while my apache changes seem to have helped a bit, things are still slowing down after 10-12 hours. i have a few other things i can look at, and will get as much done as possible before the weekend. serious troubleshooting will begin anew monday. apologies a

Re: [build system] jenkins wedged again, rebooting master node

2019-03-22 Thread shane knapp
quick update: since kicking httpd on the jenkins master "fixes" the GUI hanging, i set up a cron job to restart httpd 4 times per day. this is not the final solution, but will definitely help over the weekend as i'm heading out of town. shane On Fri, Mar 22, 2019 at 9:50 AM shane knapp wrote: