On Sat, Apr 29, 2023 at 12:41:26PM +0100, andrew.coop...@citrix.com wrote: > On 29/04/2023 4:05 am, Stefano Stabellini wrote: > > On Fri, 28 Apr 2023, GitLab wrote: > >> Pipeline #852233694 triggered by > >> [568538936b4ac45a343cb3a4ab0c6cda?s=48&d=identicon] > >> Ganis > >> had 3 failed jobs > >> Failed jobs > >> ✖ > >> test > >> qemu-smoke-dom0less-arm64-gcc > > This is a real failure on staging. Unfortunately it is intermittent. It > > usually happens once every 3-8 tests for me. > > > > The test script is: > > automation/scripts/qemu-smoke-dom0less-arm64.sh > > > > and for this test it is invoked without arguments. It is starting 2 > > dom0less VMs in parallel, then dom0 does a xl network-attach and the > > domU is supposed to setup eth0 and ping. > > > > The failure is that nothing happens after "xl network-attach". The domU > > never hotplugs any interfaces. I have logs that show that eth0 never > > shows up and the only interface is lo no matter how long we wait. > > > > > > On a hunch, I removed Alejandro patches. Without them, I ran 20 tests > > without any failures. I have not investigated further but it looks like > > one of these 4 commits is the problem: > > > > 2023-04-28 11:41 Alejandro Vallejo tools: Make init-xenstore-domain use > > xc_domain_getinfolist() > > 2023-04-28 11:41 Alejandro Vallejo tools: Refactor console/io.c to avoid > > using xc_domain_getinfo() > > 2023-04-28 11:41 Alejandro Vallejo tools: Create > > xc_domain_getinfo_single() > > 2023-04-28 11:41 Alejandro Vallejo tools: Make some callers of > > xc_domain_getinfo() use xc_domain_getinfol > > In commit order (reverse of above), these patches are: > > 1) Modify the python bindings and xenbaked > 2) Introduce a new library function with a better API/ABI > 3) Modify xenconsoled > 4) Modify init-xenstore-domain > > The test isn't using anything from 4 or 1, and 2 definitely isn't > breaking anything on its own. > > That just leaves 3. This test does turn activate xenconsoled by virtue > of invoking xencommons, but that doesn't help explain why a change in > xenconsoled interferes (and only intermittently on this one single test) > with `xl network-attach`. > > The xenconsoled change does have correctness fix in it, requiring > xenconsoled to ask for all domains info in one go. This does mean it's > hypercall-buffering (i.e. bouncing) a 4M array now where previously it > was racy figuring out which VMs had come and gone.
Can it be that xl network-attach fails and that failure is silently ignored by the test? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
signature.asc
Description: PGP signature