On 26/02/2013 1:07 p.m., Alex Rousskov wrote:
On 02/25/2013 03:45 PM, Amos Jeffries wrote:
On 26/02/2013 11:06 a.m., Alex Rousskov wrote:
      The attached patch adds about 700'000 test cases for eight
SBuf::find() methods. Needless to say, they are generated rather than
manually written. They take a couple of seconds to run. Configuration
options allow us to increase or decrease the number of test cases if
needed.
Why do we need so many tests?
I do not know how many tests are needed. Somebody can study that (if
they feel that reducing the number of generated tests is important).
More on that below.


Surely a selection of lengths from the middle, and specific scans of all
the edge-boundary tests would be enough?
That is what the random tests are doing. One could massage them into
using shorter strings (already a parameter), selecting fewer lengths
from the middle (a simple code modification), and/or doing fewer
edge-boundary tests (a simple code modification), of course.


That can't be more than a few hundred - which is few enough to run each
as a separate CppUnit case.
I have not studied whether doing so will reduce the number of bugs
found. I only studied that going in the opposite direction (more test
cases) does not seem to help much. Please note that it is difficult to
know for sure because while many test cases find bugs, it is difficult
to isolate independent/different bugs without actually fixing SBuf.

Why is "a few hundred" is OK as a separate CppUnit case but "a few
hundred thousands" is not OK? I am _not_ trying to argue either way,
just curious what the criteria for sizing CppUnit cases are, since I
know so little about CppUnit...


I like what this does already just because it finds bugs, and I'm happy with generating the actual strings randomly to see if we can fuzz up some failure somewhere as an on-top kind of check. But I don't see the need for nearly so many checks of the basic border conditions. I think we could auto-generate a smaller more deterministic/targeted set of checks for better results and clarity about what the coverage is exactly.


1) the weak argument:
700K '.' characters is a lot to output if we want to make each test a CPPUNIT_ASSERT*() - assuming it passes the tests. 700K failures would be a few GB or TB of log.
 Whereas a few hundred tests is only a few hundred bytes more build log.


2) the slightly stronger reason:
Because the SBuf content can be broken down into "fields" of octet value ranges that should all have consistent behaviour relating to how they break string implementations: ASCII visible characters, extended 127+ octets, and ASCII control codes (\1 to \31 and \126), and nil octet \0. That is only 16 (4^2) sets of permutations needed to cover all the necessary byte code handling failures of whatever internal implementation is used (possibly less if we tailor the unit-tests to the specific implementation).

** I understand std::string cannot handle most of these octect value ranges, so this test can only be performed using the printabe characters. So for now char permutations: 1.

eg. a search for "b" in "acd" is no better than searching for "b" in "efg". The important "unit" is that you are searching for a missing pattern of length 1.

It is only worthwhile testing what you label "placements" once for each permutation in other factors. To my knowledge the pattern units / placements we need to check for each of the above permutations is:
 - pattern out of range (off end of SBuf)
 - exact match
 - absent pattern
 - pattern too long
 - present pattern
 - present needle prefix
 - present needle suffix
 - present and fully out of range prefix
 - present ending in-range prefix
 - present and fully out of range suffix
 - present starting in-range suffix

 ** Placement pattern permutations: 11

All of the above for a couple of different lengths from 0, 1, 2, ... 5 should cover it.

 ** needle length permutations: 5

1 * 11 * 5 ==> 55 tests covering all permutations of input placement, needle size. Adding the 16 character set permutations only makes that 880 tests total.

For further find tuning the raw hay could be a set of '_' characters where needle(s) are placed. Ensuring that the needle does not contains '_' will prevent those TODO worries you mention about whether the test is checking multiple needles by accident already. (corruption of results is not a worry, just knowing whether it is being covered or not *is*)



If more CppUnit cases are needed, I am sure it would be possible to
parameterize the current code to generate more specific test cases.
After repeating such "focused" generation inside each appropriate
CppUnit case, one would end up with multiple CppUnit test cases covering
approximately the same input space.


Also, using fuzzing with a random input leaves us open to the rare
occasions when a single byte octet value at a specific location is the
cause of a bug. Think \0 at octet 5 when doing 32-bit chunk comparisions
in some buggy implementation checking against !*p. That type of failure
appears intermittently and inspires confusion rather than a fix.
Bugs notwithstanding, all random input is 100% reproducible so there
should not be any intermittent behavior originating from the test cases
themselves (at least as far as the same execution environment is
concerned -- I do not know whether random() implementations yield
different sequences on different platforms).

They are supposed to. But I would not bet on that either.
Morely likely unique sequence per kernel family, with a few exceptions in those designed for higher security paranoia like OBSD. They only thing we can be certain of is same kernel + same math libraries + same seed implies same sequence.

Amos

Reply via email to