Some comments inline. > On Feb 27, 2019, at 6:08 PM, Christopher Collins <ch...@runtime.io> wrote: > > Hello all, > > In this email, I would like to discuss Mynewt's test facilities. > > ### INTRO (AKA TL;DR) > > Unit tests come in two flavors: 1) simulated, and 2) real hardware. > Writing a test capable of running in both environments is wasted > effort. Let's revisit Mynewt's test facilities with a clear > understanding that they must solve two different use cases. > > In this email, I want to: > 1. Clarify the differences between the two test environments. > 2. Propose some high-level changes to Mynewt's test facilities. > > ### TEST ENVIRONMENTS > > The two test environments are summarized below. > > ##### SIMULATED > > Simulated tests ("self tests"): > * Are executed with `newt test`. > * Automatically execute a set of tests. No user intervention > required. > * Report results via stdout/stderr and exit status. > * Terminate when the tests are complete. > > A typical self test is a very basic application. Usually, a self test > contains the package-under-test, the testutil package, and not much > else. Most self tests don't even include the OS scheduler; they are > simply a single-threaded sequence of test cases. For these simple > tests, `sysinit()` can be executed between each test case, obviating > the need for "setup" and "teardown" code. Self tests are allowed to > crash; such crashes get reported as test failures. Finally, self tests > have effectively unlimited memory at their disposal. > > ##### REAL HARDWARE > > Test apps that run on real hardware ("hw tests"): > * Run a test server. The user executes tests on demand with the > newtmgr `run` command. > * Typically contain test suites from several packages. > * Record results to a Mynewt log. > * Run indefinitely. The device does not reset itself between > tests. > > Hw tests are more complex than self tests. These test apps need to > include the scheduler, the newtmgr server, a transport (e.g., BLE), and > a BSP and all its drivers. Test code cannot assume a clean state at > the start of each test. Instead, the code must perform manual clean up > after each test case. Hw test cases must be idempotent and they must > not crash. Finally, memory is constrained in this environment. > > ##### PREFER SELF > > Clearly, self tests are easier to write than hw tests. Equally > important, self tests are easier to run - they can run in automated CI > systems without the need for a complicated test setup. > > So self tests should always be preferred over hw tests. Hw tests should > only be used when necessary, i.e., when testing drivers or the system > as a whole. I won't go so far as to say there is never a reason to run > the same test in both environments, but it is so rarely needed that it > is OK if it requires some extra effort from the developer. > > ### TEST FACILITIES > > Mynewt has two unit test libraries: `testutil` and `runtest`. > > ##### TESTUTIL > > Testutil is a primitive unit test framework. There really isn't > anything novel about this library - suites, cases, pass, fail, you get > the idea :). This library is used in both test environments. > > I have one concern about this library: > > In self tests, does `sysinit()` get called automatically at the > start of each test case? > > Currently, the answer is no. > > ##### RUNTEST > > Runtest is a grab bag of "other" test functionality: > > 1. Command handler for the `run` newtmgr command. > 2. API for logging test results. > 3. Generic task creation facility for multithreaded tests. > > Features 1 and 2 are only needed by hw tests. Feature 3 is needed by > both self tests and hw tests. > > ### PROPOSALS > > 1. (testutil): Modify `TEST_CASE()` to call > `sysinit()` if `MYNEWT_VAL(SELFTEST)` is enabled. Would we want all TEST_CASE() to call sysinit if a single syscfg is defined? Maybe for some tests you want it and some you dont?
> > 2. (runtest): Remove the task-creation functionality from `runtest`. > This functionality can go in a new library (call it "taskpool" for the > sake of discusion). > > 3. (taskpool) Further, I think the taskpool functionality could be > tailored specifically towards unit tests. The remainder of this email > deals with this proposal > > I will use an example as motivation for this library. Here is a simple > test which uses our to-be-defined taskpool library. This example tests > that a `cbmem` can handle multiple tasks writing to it at the same > time. Note: this is not meant to be good test code :). > > static struct cbmem cbm; > > /** > * Low-priority task handler. Writes 20 entries to the cbmem. > * Uses a rate of 1 OS tick per write. > */ > static void > task_lo(void *arg) > { > int i; > > for (i = 0; i < 20; i++) { > cbmem_append(&cbm, "hello from task_lo", 18); > os_time_delay(1); > } > } > > /** > * High-priority task handler. Writes 10 entries to the cbmem. > * Uses a rate of 2 OS ticks per write. > */ > static void > task_hi(void *arg) > { > int i; > > for (i = 0; i < 10; i++) { > cbmem_append(&cbm, "hello from task_hi", 18); > os_time_delay(2); > } > } > > TEST_CASE(cbmem_thread_safety) { > /* Initialize cbmem. */ > uint8_t buf[1000]; > cbmem_init(&cbm, buf, sizeof buf); > > /* Create and run the two test tasks. */ > taskpool_create(task_hi, 10 /* Priority */); > taskpool_create(task_lo, 20 /* Priority */); > > /* Wait for both tasks to complete. Fail after three seconds. */ > taskpool_wait(3 * OS_TICKS_PER_SEC); > > /* Test result automatically reported here. */ > } > > This example illustrates a few requirements of taskpool: > > 1) A taskpool task (tp-task) is allowed to "run off" the end of its > handler function. Normally, it is a fatal error if a Mynewt > task handler returns. > Not sure why I do not like this but is there some need to have a task basically return? > 2) Client code can block until all tp-tasks have terminated. > > In multithreaded test code, tasks are typically short-lived and simple. > Mynewt's test facilities should make it easy to create such tasks. > > There is one more requirement that isn't expressed in the example, but > which is important: > > 3) Client code can abort all tp-tasks before they return. This is > needed when a fatal test failure occurs. > > I have some ideas for how this could be implemented, but I think that > is best left for another email (or a PR!). Any concerns or ideas about > implementation are welcome, but I am particularly interested in high > level criticism: > > * Does this seem useful for unit testing? > * Is any major functionality missing? > * Any completely different ideas that are better than this one? > Seem fairly reasonable. Is taskpool_create() designed to save the test developer some time creating tasks or does it have some other functionality? > Thanks, > Chris