On Feb 12, 2015, at 13:12, Garrett Cooper <yaneurab...@gmail.com> wrote:
> On Feb 9, 2015, at 18:51, James Gritton <ja...@freebsd.org> wrote: > >> On 2015-02-06 22:23, Garrett Cooper wrote: >>> On Feb 6, 2015, at 18:38, James Gritton <ja...@freebsd.org> wrote: >>>> On 2015-02-06 19:23, Garrett Cooper wrote: >>>>> I think you broke the Jenkins tests runs, and potentially jail support >>>>> in some edgecases: >>>>> https://jenkins.freebsd.org/job/FreeBSD_HEAD-tests2/651/ >>>> Where do I go from here? There error you refer to certainly seems >>>> jail-related, which leads me to guess at something disconnected between >>>> the matching rc.d/jail and jail(8) change (i.e. using the new rc file with >>>> the old jail program). But that's really just a wild guess. Is there >>>> somewhere I look for more information? For example, where does Jenkins >>>> actually do its thing? >>>> Sorry for being so stupid in this - Jenkins has only been on the very edge >>>> of my awareness until now. >>> I honestly don’t think it’s Jenkins because Jenkins runs in bhyve. I >>> think you accidentally broke option handling in the jail configuration >>> (please see my other reply about added “break;” statements). >>> ... >>> You can verify your changes by doing: >>> % (cd /usr/tests/bin/pkill; sudo kyua test) >> >> After some testing and looking around, I've decided the problem definitely >> isn't in rc.d where I thought it might be. I've also decided it's probably >> not in my patch either. >> >> I've run this kyua test on a 10 system (don't have current handy for such >> things at the moment), and sometimes I would see a failure and sometimes I >> wouldn't. This was whether I was using the new or old jail code. Later in >> the day, when the box was less loaded, it seemed to always pass. Looking at >> the pkill-j_test script, I see jails being created with sleep commands both >> inside and outside the jail around its creation. I'm guessing this script >> is very sensitive to timing issues that could be cause by (among other >> things) system load. The jail commands in this script were also very >> simple, with the only parameters used being: path, name, ip4.addr, and >> command. This isn't some kind of esoteric exercising of the jail(8) >> options, and I would expect if it works at one time it would work at >> another. I've "hand-run" these particular jail commands and couldn't get >> them to fail (and the actual content of the jail(8) changes were tests >> already). >> >> I looked at the freebsd-current (I think) list where the Jenkins errors are >> posted, and it's true it started failing the pkill-j test at the time I made >> my change. But it's also true that it had failed that test once the day >> before my change, and then started passing it again. This particular test >> just seems to be fragile. >> >> So I don't have anywhere else to go with this. I'm going to assume jail(8) >> isn't the problem here. > > The tests are racy and make some interesting assumptions. It appears that > WITNESS plays a part in it, and I bet VIMAGE (something that I don’t have in > my kernel config) plays a part in it too. I say this because I just ran into > the issue when running the tests in a tight loop on my VMware workstation 7 > instance with code from r278636. > > Doesn’t surprise me because before r272305, it was failing consistently on > head, so what Craig did in that commit helped, but it didn’t fully fix the > raciness of the tests. > > I’m going to recompile my system with VIMAGE and see if that impacts > performance of the tests, and if so, I’ll adjust the sleep between setting up > the jailed instances, and waiting for them to be fully formed. > > Thanks! > > $ while : ; do sudo prove -rv pgrep-j_test.sh || break; done > pgrep-j_test.sh .. > 1..3 > usage: pgrep [-LSfilnoqvx] [-d delim] [-F pidfile] [-G gid] [-M core] [-N > system] > [-P ppid] [-U uid] [-c class] [-g pgrp] [-j jid] > [-s sid] [-t tty] [-u euid] pattern ... > not ok 1 - pgrep -j <jid> # pgrep output: '', pidfile output: '74275 74278' > ok 2 - pgrep -j any > ok 3 - pgrep -j none > Failed 1/3 subtests > > Test Summary Report > ------------------- > pgrep-j_test.sh (Wstat: 0 Tests: 3 Failed: 1) > Failed test: 1 > Files=1, Tests=3, 5 wallclock secs ( 0.04 usr 0.02 sys + 0.02 cusr 0.55 > csys = 0.63 CPU) > Result: FAIL This Jenkins run is interesting: https://jenkins.freebsd.org/job/FreeBSD_HEAD-tests2/686/testReport/junit/bin.pkill/pgrep-j_test/main/ . The first run passed, but the second one didn’t (more output than expected). This error shouldn’t occur after r278636, but it definitely confirms the fact that the test is racy, in other ways.
signature.asc
Description: Message signed with OpenPGP using GPGMail