Re: help needed to manage s390x host for ci.debian.net

2023-05-09 Thread Dipak Zope
> > Hello Paul, > > Thank you for the update on the status of the system and the Munin graphs. > I think it's a good idea to wait until after the bookworm release to > investigate further. > > However, I would like to take you up on the offer to schedule and/or > arrange access to an LXC container

Re: help needed to manage s390x host for ci.debian.net

2023-04-20 Thread Paul Gevers
Hi Elizabeth, On 18-04-2023 22:46, Elizabeth K. Joseph wrote: I noticed that the Munin graphs are showing that the queue problems from earlier this year seem to have been reduced now, is that correct, or has the VM just not been restarted lately? It would be helpful to have a starting point.

Re: help needed to manage s390x host for ci.debian.net

2023-04-18 Thread Elizabeth K. Joseph
On Sun, Feb 12, 2023 at 1:39 PM Paul Gevers wrote: > I can provide logging from the host, but I'll need detailed instructions > of what people find useful to look at. Recently Antonio taught me a > trick to provide temporary access to a lxc container on any of our > hosts, so if it helps to be on

Re: help needed to manage s390x host for ci.debian.net

2023-02-28 Thread Paul Gevers
Hi, On 28-02-2023 01:39, James Addison wrote: Attempting to sum together what look, to me, like a pair of 2s: * The s390x Debian CI queue size[1] is growing again. Yes, but this time it's because some test seems to be misbehaving (only on s390x or big endian or... ) and fills the disk

Re: help needed to manage s390x host for ci.debian.net

2023-02-27 Thread James Addison
Attempting to sum together what look, to me, like a pair of 2s: * The s390x Debian CI queue size[1] is growing again. * A recent bug report[2] by Dipak describes userspace processes getting stuck on an s390 Linux kernel version that Debian's CI infra has been using The bug does seem to have

Re: help needed to manage s390x host for ci.debian.net

2023-02-21 Thread Paul Gevers
Hi, On 21-02-2023 17:46, Dipak Zope1 wrote: I am wondering whether we have downgraded the machines to 5.10.0-20 kernel to get rid of the kernel bug. I think I mentioned it before, we downgraded indeed: root@ci-worker-s390x-01:~# uname -a Linux ci-worker-s390x-01 5.10.0-20-s390x #1 SMP Debian

RE: help needed to manage s390x host for ci.debian.net

2023-02-21 Thread Dipak Zope1
I am wondering whether we have downgraded the machines to 5.10.0-20 kernel to get rid of the kernel bug which is known to cause issue in user processes at random - described in the cover letter here: https://lists.debian.org/debian-s390/2023/02/msg00019.html The following patch fixes this

Re: help needed to manage s390x host for ci.debian.net

2023-02-20 Thread Philipp Kern
On 16.02.23 17:49, Paul Gevers wrote: As you can see e.g. here [1,2] it comes and goes (albeit sometimes the queue was empty). I don't think its very different, I just never got out of the s390x host what I was expecting. Long time I blamed it on the "stealing" that happens on a shared host,

Re: help needed to manage s390x host for ci.debian.net

2023-02-18 Thread Philipp Kern
Hi, On 17.02.23 17:04, Antonio Terceiro wrote: So there is for sure something wrong with the client-server connection there. Reworking the client for robustness is on my TODO list for a while. There's a lot of these: Feb 14 08:56:25 ci-worker-s390x-01 debci[1155941]: waiting for header

Re: help needed to manage s390x host for ci.debian.net

2023-02-18 Thread James Addison
> James Addison suggested in [3] to increase a prefetch counter in amqp > (although its the same on all hosts); I have done so on the s390x host and at > least initially it seems to help keeping the host busier. Thanks for applying that - I was hoping that the change might also result in

Re: help needed to manage s390x host for ci.debian.net

2023-02-17 Thread Antonio Terceiro
On Tue, Feb 14, 2023 at 09:42:09PM +0100, Paul Gevers wrote: > Hi Phil, > > On 13-02-2023 08:57, Philipp Kern wrote: > > On 12.02.23 22:38, Paul Gevers wrote: > > > I have munin [1], but as said, I'm not a trained sysadmin. I don't > > > know what I'm looking for if you ask "statistics on the

Re: help needed to manage s390x host for ci.debian.net

2023-02-16 Thread Paul Gevers
Hi, On 13-02-2023 15:59, Dipak Zope1 wrote: There is some issue with 5.10.0-21 kernel and we are working on it. This can cause performance impact on CI servers. I have rebooted to the old kernel yesterday. That helps a bit indeed, although most of the issues I reported predate that kernel

Re: help needed to manage s390x host for ci.debian.net

2023-02-14 Thread Paul Gevers
Hi Phil, On 13-02-2023 08:57, Philipp Kern wrote: On 12.02.23 22:38, Paul Gevers wrote: I have munin [1], but as said, I'm not a trained sysadmin. I don't know what I'm looking for if you ask "statistics on the network". This is more of a software development / devops question than a

RE: help needed to manage s390x host for ci.debian.net

2023-02-13 Thread Dipak Zope1
porting team From: Philipp Kern Date: Monday, 13 February 2023 at 1:28 PM To: Paul Gevers , debian-s390 , bar...@velocitysoftware.com , Paul Flint Cc: Debian CI team Subject: [EXTERNAL] Re: help needed to manage s390x host for ci.debian.net Hi, On 12.02.23 22:38, Paul Gevers wrote: > I h

Re: help needed to manage s390x host for ci.debian.net

2023-02-12 Thread Philipp Kern
Hi, On 12.02.23 22:38, Paul Gevers wrote: I have munin [1], but as said, I'm not a trained sysadmin. I don't know what I'm looking for if you ask "statistics on the network". This is more of a software development / devops question than a sysadmin question, but alas. What I am interested in

RE: help needed to manage s390x host for ci.debian.net

2023-02-12 Thread Dipak Zope1
I am not CI/networking expert, but I will be more than happy to assist. I am at +0530 hrs available widely. Thanks, -Dipak Zope Debian s390 porting team On 13/02/23, 3:09 AM, "Paul Gevers" wrote: Hi Phil and all others offering help, On 12-02-2023 20:32, Philipp Kern wrote: > On 11.02.23

Re: help needed to manage s390x host for ci.debian.net

2023-02-12 Thread Paul Gevers
Hi Phil and all others offering help, On 12-02-2023 20:32, Philipp Kern wrote: On 11.02.23 18:18, Paul Gevers wrote:  * [suspect 1] network issues between the s390x and the main ci.d.n server (the results (log files) of the autopkgtests are transferred to the main server). Our ppc64el hosts

Re: help needed to manage s390x host for ci.debian.net

2023-02-12 Thread Philipp Kern
On 11.02.23 18:18, Paul Gevers wrote: * [suspect 1] network issues between the s390x and the main ci.d.n server (the results (log files) of the autopkgtests are transferred to the main server). Our ppc64el hosts are also located at Marist, so I would expect commonality here, but also ppc64el

help needed to manage s390x host for ci.debian.net

2023-02-11 Thread Paul Gevers
Dear all, This is a call for help, mostly for s390x specifically and debugging our s390x host, ci.debian.net is the infrastructure that enables the Debian Release Team to run autopkgtest as part of the quality assurance for unstable-to-testing migration. Historically, that used to be