On 8/26/21 5:56 PM, enh wrote: > I keep telling people I could spend a focused year on JUST the test suite > and > they don't believe me. When people talk about function testing vs > regression > testing vs coverage testing I get confused because it's all the same > thing? > > > i'll include the main failure modes of each, to preempt any "yes, but"s by > admitting that _of course_ you can write fortran in any language, but the idea > is something like: > > integration testing - answers "does my product work for real use cases?". you > definitely want this, for obvious reasons, and since your existing testing is > integration tests, i'll say no more. other than that the failure mode here is > relying only on integration tests and spending a lot more time/effort > debugging > failures than you would if you could have caught the same issue with a unit > test.
I'm relying on the fact I wrote almost all the code myself, and thoroughly reviewed the rest, to be able to mentally model what everything is doing. That said, I'm trying to get the bus number up so you don't NEED me to do this sort of thing... > unit testing - reduces the amount of digging you have to do _when_ your > integration tests fail. (also makes it easier to asan/tsan or whatever, though > this is much more of a problem on large systems than it is for something like > toybox, where everything's small and fast anyway, versus "30mins in to > transcoding this video, we crash" kinds of problem.) for something like toybox > you'd probably be more interested in the ability to mock stuff out --- your > "one > day i'll have qemu with a known set of processes" idea, It's kinda hard to test things like ps/ifconfig/insmod outside of a carefully controlled known environment. > but done by swapping > function pointers. one nice thing about unit tests is that they're very easily > parallelized. on a Xeon desktop i can run all several thousand bionic unit > tests > in less than 2s... whereas obviously "boot a device" (more on the integration > test side) takes a lot longer. the main failure mode here (after "writing good > tests is at least as hard as writing good code", which i'm pretty sure you > already agree with, and might even be one of your _objections_ to unit tests), Eh, the toys/example/demo_$THINGY commands are sort of intended to do this kind of thing for chunks of shared infrastructure (library code, etc). My objection here is really granularity, if you test at TOO detailed a level you're just saying "this code can't change". I recently changed xabspath() to have a flag based interface, changed its existing users in the commands, and have the start of a toys/example/demo_abspath.c (which I mentioned in my blog I was too exhausted to properly finish at the time). Granular tests directly calling the functions would have been invalidated by the change, meaning I'd either have deleted them or rewritten them. With other people's test suites I often encounter test failures that don't MEAN anything. Some test is failing because the semantics of something somewhere changed, and none of the users care, and the test suite accumulates "known failures" like code emitting known warnings. A libc has an API with a lot of stable documented entry points. Toybox's entry points are almost entirely command line utilities with a shared entry codepath (including option parsing) and a shared library of common functions. I don't want to test the lib/*.c code directly from something that ISN'T a command (some other main() in its own .c function, possibly accessing it via dlopen() or something) because the top level main.c initializes toy_list[] and has toy_init() and toy_find() and toy_exec() and so on. If I factored that out I'd _only_ be doing so for the test suite, not because it made design sense. I don't want to duplicate plumbing and test in a different environment than I'm running in. The mkroot images are "tiny but valid". It's a theoretically real system you could build up from, and tells me "how does this behave under musl on a bunch of targets", using a real Linux kernel and so on. > is writing over-specific unit tests. rather than writing tests to cover "what > _must_ this do to be correct?" people cover "what does this specific > implementation happen to do right now, including accidental implementation > details?". Yup. Seen a lot of that. :( > (i've personally removed thousands of lines of misguided tests that > checked things like "if i pass _two_ invalid parameters to this function, > which > one does it report the error about?", where the correct answer is either > "both" > or "who cares?", but never "one specific one".) I've bumped into some of that in toysh because I want to match bash's behavior and alas bash is one of those "the implementation is currently the standard" things where every implementation detail hiccup IS the current spec. That said, I've blogged about making a few digressions anyway just because my plumbing doesn't work like bash's does (they're gratuitously making multiple passes over the data and I'm doing it all in one pass, and there are some places where "all x happens before all y" bubbles visibly to the surface and I just went no.) For example the "order of operations" issue in https://landley.net/notes-2021.html#18-03-2021 > coverage - tells you where your arse is hanging out the window _before_ your > users notice. (i've had personal experiences of tests i've written and that > two > other googlers have code reviewed that -- when i finally got the coverage data > -- turned out to be missing important stuff that [i thought] i'd explicitly > written tests for. This is what I mean by testing the error paths. If I have a statement the code flow doesn't ever go through in testing, I'd like to know why. There's presumably tools for this (I think valgrind has something), but that's waaaaaay down the road. > Android's still working on "real time" coverage data showing > up in code reviews, but "real Google" has been there for years, and you'd be > surprised how many times your tests don't test what you thought they did.) Sadly, I would not by surprised. :( > the main failure mode i've seen here is that you have to coach people that > "90% is > great", and that very often chasing the last few percent is not a good use of > time, https://en.wikipedia.org/wiki/Pareto_principle And here is an excellent mathematical walkthrough of the math behind it: https://www.youtube.com/watch?v=sPQViNNOAkw#t=6m43s Which is why the old saying "the fist 90% of the work takes 90% of the time, the remaining 10% of the work takes the other 90% of the time" is ALMOST right, it's that the next 9% takes another 90% in a xeno's paradox manner (addressing 90% of what's left takes a constant amount of time) until you shoot the engineers and go into production. > and in the extreme can make code worse. ("design for testability" is good, > but -- like all things -- you can take it too far.) My grumble is I'm trying to write a lot of tests that toybox and the debian host utilities can BOTH pass. I want to test the same code linked against glibc, musl, and bionic. I want to test it on big endian and little endian, 32 bit and 64 bit, systems that throw unaligned access faults, nommu... > You > have to test every decision point (including the error paths), you have to > exercise every codepath (or why have that codepath?) and you have to KEEP > doing > it because every distro upgrade is going to break something. > > yeah, which is why you want all this stuff running in CI, on all the platforms > you care about. People use "continuous integration" as an excuse not to have releases. No two people should ever run quite the same version and see quite the same behavior, we're sure random git snapshot du jour is fine... I object on principle. > In my private emails somebody is trying to make the last aboriginal linux > release work and the old busybox isn't building anymore because makedev() > used > to be in #include <sys/types.h> and now it's moved to <sys/sysmacros.h>. > (Why? I > dunno. Third base.) > > the pain of dealing with that pointless deckchair crap with every glibc update > is one reason why (a) i've vowed never to do that kind of thing again in > bionic > [we were guilty of the same crime in the past, even me personally; the most > common example being transitive includes] and (b) i'm hoping musl will care a > bit more about not breaking source compatibility ... but realize he's a bit > screwed because code expecting glibc might come to rely on the assumption that > <sys/types.h> *doesn't* contain makedev(), say --- i've had to deal with that > kind of mess myself too. sometimes you can't win. He has a very active mailing list and IRC channel (now on libre.net like everybody else) where they argue about that sort of thing ALL THE TIME. (That said, I poked him to see if he wants to make a policy statement about this. Or has one somewhere already.) My complaint way back when was objecting to the need to #define GNU_GNU_ALL_HAIL_STALLMAN in order to get the definition for linux syscall wrappers (which have NOTHING to do with the gnu project). I made puppy eyes at Rich until he added the _ALL_SOURCE define so musl headers could just give me everything they knew how to do do without micromanaging feature macros. (I'm already #including some headers and not including others, that's the granularity that makes SENSE...) And then I wound up doing: #define unshare(flags) syscall(SYS_unshare, flags) #define setns(fd, nstype) syscall(SYS_setns, fd, nstype) anyway. :) Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net