Unfortunately due to interrupts and illness I didn't manage to get much actual +1 maintenance done in my shift last week. Another big factor was that I chose to focus deeply on obscure autopkgtest failures to a depth I've not done before. This, together with some waiting on some very long queues, caused me to spend most of my time trying to figure out how things work, rather than actually making progress in the queues. Hopefully, by documenting my learning and progress here, that will help everyone though. But I don't really have any conclusions here; just mysterious that I'd appreciate someone who understands clearing this up, and for magic behaviour to be documented.
vorlon sent me a ping suggesting I work on getting perl through, so I focused on that. Most of the vorlon perl autopkgtest retries were still queued, especially for arm64. I found one spurious failure on apache2, retried. Then I tried to find other things to unblock on the sea of red autopkgtest failures listed against perl. But with queues appearing so large, I was reluctant to throw more retests at it in case these retests were already queued by others. I spent some time trying to figure out how I could detect if a retry was already queued or running before submitting it. https://autopkgtest.ubuntu.com/running is helpful there, but the output was so long it was not practical to match a given entry against the architecture heading it was under. I thought I recalled a colleague mentioning something about this in ubuntu-helpers, but there seemed nothing relevant in its README.md. Some hours later[2] I was directed to lp-test-isrunning, which is exactly what I needed, and I added an entry to README.md about this. Then, it seemed that nearly every failure I looked at matched a known pattern for which vorlon had already submitted retries that were still queued. I thought it would be helpful to find failures that did *not* match the known patterns. But I couldn't find any tool to identify these either. I spent some time studying existing tooling, and concluded that retry-autopkgtest-regressions probably has some code I could reuse to write this. But then by the time I started the next day, only arm64 queues were remaining, so rather than solve this problem still, it was easier to just focus on the non-arm64 failures. I found htslib (an rdep of perl) failing on amd64 for what seemed to be an unrelated reason, so focused on this. This led to a rabbithole of undocumentation, finding and inspecting code to figure out how to reproduce it at all, since it succeeded locally for me. The actual code that converts an autopkgtest request into a run of the autopkgtest command in infrastructure is https://git.launchpad.net/autopkgtest-cloud/tree/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker/worker. Here you can see the handling of arguments like "triggers", "all-proposed" and magic like "migration-reference/0" and suchlike. This took a while to find - there's no mention of it from autopkgtest.ubuntu.com. "Docs for admins" looked promising but neglects to link it. As used by infrastructure, --setup-commands /home/ubuntu/autopkgtest/setup-commands/setup-testbed is presumably equivalent to /usr/share/autopkgtest/setup-commands/setup-testbed as shipped by the autopkgtest package itself. As used by infrastructure, --setup-commands /home/ubuntu/autopkgtest-cloud/worker-config-production/setup-canonical.sh can be found at https://git.launchpad.net/autopkgtest-cloud/tree/charms/focal/autopkgtest-cloud-worker/autopkgtest-cloud/worker-config-production/setup-canonical.sh The mechansism used that translates triggers into magic "just these from proposed" is done via the --apt-pocket=proposed=... autopkgtest commandline, so the actual apt pinning is done by autopkgtest itself. The use of --env=ADT_TEST_TRIGGERS= etc was a red herring for me. I couldn't find anything in autopkgtest itself that uses this variable. I only saw it being used during test result collection. So what does ADT_TEST_TRIGGERS= do, anyway? However, sometimes you might find --apt-pocket=proposed=... *not* used, even when triggers appear to exist. I didn't get to the bottom of this. Maybe it's something to do with magic fallback behaviour I've heard mentioned (an automatic retry with different conditions?) but couldn't find anything documented and didn't find any implementation for this. For example, looking at [1], vorlon's test dated 2022-11-28 19:26:59 UTC and my subsequent retest dated 2022-11-30 10:20:55 UTC were subsequently identical, but looking at the logs, his one was called with --apt-pocket=proposed with no pinning defined, whereas mine was. All I did was hit the retry button against vorlon's test request and failure, and mine got run differently! I'd appreciate someone explaining this to me. I'm not certain it's related, but it turned out that in infrastructure libssl3 was being upgraded to the version in the proposed pocket before the test is run[3]. I never figured out how to reproduce that. But if I arranged the same start condition: by providing autopkgtest with an environment where libssl3 was already on 3.0.5-2ubuntu2, as upgraded to in the autopkgtest log from infrastructure, then I could also see what appeared to be the same problem. Even though openssl is specified as a trigger, it seemed that autopkgtest didn't want to take libssl-dev from the proposed pocket to satisfy test dependencies, and the release pocket version has a hard versioned dependency on the same libssl3 version. However it's not clear if this was *the* problem. Two questions I ran out of time trying to answer then: 1) Why was libssl3 being upgraded from proposed before the test began? Maybe something to do with NotAutomatic and pinning? I couldn't reproduce this locally. 2) Why was libssl-dev not being pulled from proposed even when --apt-pocket=proposed=openssl was being specified? In investigating the above, I chose to use lxd since a package dependency resolution issue should reproduce just fine in a container, and working with containers is usually much quicker. But when I couldn't reproduce against ubuntu-daily:lunar, I tried images:ubuntu/lunar/amd64 for a minimalist base image, but that doesn't exist. I found this surprising given that images:debian/sid/amd64 does exist. I tried upgrading up from images:ubuntu/kinetic/amd64, and that worked (and didn't take long, with it being minimal). It also helped me notice the libssl3 thing, because openssl happens to be higher in kinetic-updates at the moment than the lunar release pocket, because it hasn't migrated from lunar proposed yet. Since then, jawn-smith, vorlon and costamagnagianfranco have all retried this particular test with various different parameters, and all have passed. The original failure condition went away, I guess. But all the logs and command lines are there, and I'm still none the wiser on this magic behaviour. [1] https://autopkgtest.ubuntu.com/packages/h/htslib/lunar/amd64 [2] Consider that "some hours" is like 10% or more of my week's entire shift! [3] https://autopkgtest.ubuntu.com/results/autopkgtest-lunar/lunar/amd64/h/htslib/20221130_102055_f8a4e@/log.gz
signature.asc
Description: PGP signature
-- ubuntu-devel mailing list ubuntu-devel@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel