Testutil again

2019-03-04 Thread Christopher Collins
Hello all,

More unit testing excitement!  I think there are some valuable changes
we could make to the `test/testutil` API, so I wanted to share some
ideas with the community.  I introduced many of the issues that I want
to focus on, so I don't really feel obligated to be polite here :).  I
just wanted to give that disclaimer in case it feels like I'm being
overly harsh.

I already touched on one of testutil's problems in an earlier email: a
lack of support for standalone self tests.  Recall that a self test is
something that runs in the simulator when you type `newt test
`.  In my earlier email, I expressed that these tests should
automatically execute `sysinit()` before running.

Now I want to discuss a second problem that I see with this library: an
abundance of configurable callbacks that makes the API too complicated.
I think we can remove about half of the callbacks without losing any
functionality.  The execution flow of a test suite is described in a
comment in `testutil.h`:

   TEST_SUITE
tu_suite_init -> ts_suite_init_cb
tu_suite_pre_test -> ts_case_pre_test_cb
tu_case_init -> tc_case_init_cb
tu_case_pre_test -> tc_case_pre_test_cb
TEST_CASE
tu_case_post_test -> tc_case_post_test_cb
tu_case_pass/tu_case_fail -> ts_case_{pass,fail}_cb
tu_case_complete
tu_suite_post_test -> ts_case_post_test_cb
tu_suite_complete -> ts_suite_complete_cb

That is a lot of callbacks for a suite with one test case!  Let's go
through them one by one.

tu_suite_init:
Description: Gets called once per suite, before the suite runs.
Proposal: Remove.
Rationale: If a suite needs to do any setup, it can do so directly
   at the top of its block.

tu_suite_pre_test:
Description: Gets called at the start of each case in the suite.
Proposal: Keep.
Rationale: This callback is the very reason why we group test cases
   into suites.  Similar test cases need to start from the
   same state.

tu_case_init:
Description: Same as tu_suite_pre_test.
Proposal: Remove.
Rationale: Redundant.

tu_case_pre_test:
Description: Gets called once at the start of the next test case.
Proposal: Remove.
Rationale: A test case is just a block of C code.  If it needs to
   perform some setup, it can do it directly at the start of
   the test.

tu_case_post_test:
Description: Gets called at the end of the current / next test case.
Proposal: Keep.
Rationale: A test case may terminate prematurely due to a failed
   `TEST_ASSERT_FATAL()`.  We need a callback to perform
   clean up because the test case may not reach the end of
   its block.

tu_case_pass / tu_case_fail:
Description: One of these gets called whenever a test case
 completes.
Proposal: Keep.
Rationale: We need some way to report results.

tu_suite_post_test:
Description: Gets called at the end of each case in the suite.
Proposal: Remove.
Rationale: Test cases can already clean up individually using
   `tu_case_post_test`.  I can see how common clean up code
   might be useful, but I don't think this is needed often
   enough to justify the added API complexity.  For
   reference, this function is only used in one place
   (NFFS), and it is used unnecessarily - the calls can be
   removed with no effect on the tests.

tu_suite_complete:
Description: Gets called at the end of each test suite.
Proposal: Remove.
Rationale: If a test suite needs to perform any cleanup (unlikely),
   it can do so directly at the end of its code block.
   Unlike a test case, a test suite never terminates early,
   so the code at the end of a suite always gets executed.

If we make the proposed removals, the execution flow looks like this:

   TEST_SUITE
   TEST_CASE
   tu_suite_pre_test_cb
   
   tu_case_pass/tu_case_fail
   tu_case_post_test_cb

I think that looks better!

I also wanted to talk about how I think these callbacks should get
configured:

tu_case_pass / tu_case_fail:
Configured at the start of the application.

tu_suite_pre_test_cb:
Always configured at the top of a test suite body.

tu_case_post_test_cb:
Always configured at the top of a test case body.

Finally, I've added an example test suite at the bottom of this
email.  This is a totally ridiculous set of tests that verify the
machine's (or compiler's) ability to do arithmetic.  Ridiculous, but it
is just intended to demonstrate the proposed API changes.

I plan to submit a PR with the proposed changes soon.  So feel free to
comment here, or wait for the PR.  In any case, all comments are
welcome.

Thanks,
Chris


--





Re: Mynewt test facilities

2019-03-04 Thread Christopher Collins
Thanks, Will.

Responses to inline comments inline :).

On Sun, Mar 03, 2019 at 06:18:46PM -0800, will sanfilippo wrote:
> > ### PROPOSALS
> > 
> > 1. (testutil): Modify `TEST_CASE()` to call
> > `sysinit()` if `MYNEWT_VAL(SELFTEST)` is enabled.
> Would we want all TEST_CASE() to call sysinit if a single syscfg is
> defined? Maybe for some tests you want it and some you dont?

Before I address this, I want to clarify my proposal a bit.  I did a
poor job of setting up the context, so I'll try again here.  Also, you
would get tired of seeing "IMHO" at the end of every sentence, so I
won't write it.  Please just understand that all the below is just my
opinion :).

Test cases should be as self-contained as possible.  We don't want any
restrictions on how test cases are ordered.  For example, something like
this is bad (hypothetical):

/* Test basic FCB functionality. */
test_case_fcb_basic();

/* The above test initialized and populated the FCB.  Make sure we
 * can delete an entry from it.
 */
test_case_fcb_delete();

The ordering of these test cases matters.  The second case depends on
side effects from the first case.  These kinds of tests are difficult to
maintain, and often it is difficult to tell what is even being tested.

Every test case should start from a clean state.  If
`test_case_fcb_delete()` needs a pre-populated FCB, it should start by
initializing and populating an FCB with the exact contents it needs.

This is why it is good to execute `sysinit()` at the start of each test
case - it lets us clean up state left over from prior test cases.  This
eliminates a lot of manual clean up code, and it helps ensure
correctness of old tests as new ones are added.

Ideally we could just call `sysinit()` at the start of each test case,
whether the test is simulated or performed on real hardware.
Unfortunately, we can't call `sysinit()` in hw tests.  The problem is
that hw tests specifically need to maintain some state between test
cases.  Calling `sysinit()` in a hw test case would reset the runtest
package, reinitialize the test result logs, and otherwise make the
system unusable as a test server.

So I think we should call `sysinit()` at the start of a test case
whenever possible.  In other words, call `sysinit()` for every self
test.  I don't think we should make this behavior conditional on a
syscfg setting, but as I said, I am totally open to opposing viewpoints.

I do think you raise a good point, though.  The semantics of
`TEST_CASE()` should not differ between self tests and hw tests.
Instead, we should create a new macro which creates a test case and
calls `sysinit()` at the top.

That is:

`TEST_CASE()` - creates a test case that does NOT call `sysinit()`.
`TEST_CASE_SELF()` - creates a test case that DOES call `sysinit()`.

(macro name off the top of my head.)

Self tests would use `TEST_CASE_SELF()`; hw tests would use
`TEST_CASE()`.

At first, I thought it would be confusing to have two macros.  Now I
think it is even more confusing to have one macro with varying
semantics.

I have some more changes I would like to make to the testutil API, but I
will save that for a future email (or PR).

> > This example illustrates a few requirements of taskpool:
> > 
> >1) A taskpool task (tp-task) is allowed to "run off" the end of its
> >   handler function.  Normally, it is a fatal error if a Mynewt
> >   task handler returns.
> > 
> Not sure why I do not like this but is there some need to have a task
> basically return?

It is not strictly needed, but it is a convenient way for a task to
signal that it has completed.  Otherwise, each of these temporary tasks
would need to set some condition variable and keep looping.  That isn't
terribly complicated, but I do think it is more complicated than this
needs to be.  These tasks are not meant to loop endlessly like most, so
why require them to be written that way?

Just to clarify - I am not proposing a change to the kernel scheduler.
The handlers for these tp-tasks would just be wrapped with something
that provides this special behavior.

> Seem fairly reasonable. Is taskpool_create() designed to save the test
> developer some time creating tasks or does it have some other
> functionality?

Its main purpose is to allow tasks to be reused among a set of test
cases.  This isn't so important for self tests, but in hw tests, it is
wasteful for each test case to allocate its own pool of tasks and
stacks.  Since only one test runs at a time, these test cases can share
a pool of tasks.

Thanks,
Chris