Title: RE: When to validate?

Andy Heninger wrote:
>
> Some important things in designing a function API are
>
> o   Fully define what the behavior is.  With a function like
>      tolower(), you could leave malformed sequences unaltered;
>      you could replace them with some substitution character;
>      you could return or not return a separate error indication;
>      or you can do anything else you can think of.
Another important thing in designing an API is to split functionality except in cases where you cannot. An example of the latter is strcpy, which should not be split into strlen plus memcpy. Unfortunately, it should return strlen as it costs nothing and can be useful. Instead, strlen returns target pointer, which is an input parameter and thus useless. But back to splitting functionality. Making API functions too smart can be a benefit in short term, because you don't need to bother with many parameters and/or proper sequence of calling the functions. But it also causes performance degradation because checks and validations are overdone. And, it prevents the basic functionality from being used on its own. And, as by a rule, a need for that arises sooner or later. Leading to creation of 'Ex' functions. If API is extended as soon as the need arises, things remain manageable. If not, users of the API start using workarounds, writing their own functions and so on. The consequences are many and can be severe.

>
>      Just don't choose "the behavior is undefined".  And don't crash.
Undefined behavior is generally bad. But making it defined at a point you don't have enough information is also not good. There are cases where a function is able to process 'invalid' data and returns 'invalid' data, but it does not crash. An example of that is conversion of a surrogate code point from UTF-32 to UTF-8. Letting the function do that is not wrong. In fact it can be desired. It can prove to be useful in a case you do not perceive at the time you are writing the function. Actually, it often happens that you don't even think about it. But the typical implementations of the algorithms are such that it works. In this particular example, I would say: do define the behavior, and the behavior should be to convert the invalid data, do not validate and drop it.

> An application as a whole needs to validate external input that is
> alleged to be in some format, and ensure that any output that is
> promised to be in some format is indeed completely in that
> format.  But
> this doesn't say anything at all about what individual
> library functions
> do or don't do.
But let's ask ourselves, what is an application, and what is a function. To a developer in a team, an application can be a program that can be run. But if that program is not run by a user, but rather by other programs in a compelex product? The programmer will be tempted to validate all input and output, causing the same problems we identified with functions: performance degradation and potential problems with extending the funtionality.

And it goes beyond that. This complex product may also be just a brick in LAN, WAN, WEB. Is it now more clear what I meant with "don't know where to start and where to end"?


Lars

Reply via email to