Hi,

Nicholas Marriott wrote on Sat, Jul 01, 2017 at 09:48:08PM +0100:

> Safer how?

 1. fgetln(3) returns a char array that is not NUL-terminated.
    That is prone to buffer overrun bugs in general.

    getline(3) always returns proper NUL-terminated C strings.

 2. The array returned from fgetln(3) contains a trailing '\n' in
    most cases, but if the stream ends without it, there won't be
    one, which aggravates the danger of item 1: if a careless
    programmer reads the buffer until '\n', they will get away with
    it on all valid text input files, so the problem is likely to
    go undetected during testing, but they may yet be vulnerable
    to buffer overrun on crafted input lacking the trailing newline.

    With getline(3), lack of the trailing '\n' is much less of an
    issue because people will likely look for the terminating '\0'
    in the first place, not for the terminating '\n'.

 3. For this reason, the fgetln(3) manual page contains a lengthy
    CAVEATS section recommending a complicated ten-line idiom to
    NUL-terminate the buffer.  In general, if something requires
    a ten-line idiom, it can hardly be called safe.  Adding the
    missing two lines for error checking at the end, the complete
    example is essentially 19 lines.

    EXAMPLES in fgetline(3) contains a code snipped doing about the
    same, and even adding two more lines to strip '\n', it is only
    10 lines long including full initialization and error checking.
    So that's half the code, much more intuitive, and much simpler,
    without any manual malloc(3) or memcpy(3) or manually assigning
    '\0' to some position calculated from pointer arithmetics,
    which we all know is prone to off-by-one.

 4. With fgetln(3), the returned pointer becomes invalid after the
    next I/O operation on the stream.  So you say line = fgetln(..),
    do something else that you consider unrelated because you never
    use "line" in it, and the next time you access "line", boom.
    That can be very surprising.  In general, library interfaces
    passing out pointers to library-owned internal memory for the
    user to mess with tend to be somewhat risky.

    By contrast, getline(3) copies the data to a dedicated buffer,
    which won't become invalid on I/O.  So the next time you use
    it, it is guaranteed to be still valid, even if that is much
    later and the file in question has long been closed.  Of course,
    if you pass the same buffer to getline(3) again, then after
    that, it will contain the *next* line; but that is not surprising
    at all, because that is exactly what you explicitly asked for.

 5. The fgetln(3) manual says that you are allowed to modify the
    returned buffer, but item 4 still applies, which aggravates the
    danger inherent in item 4.  Why would you modify the buffer,
    unless you want to do something else after that and then, yet
    later, look at your changes again?  But then you must be *very*
    careful to not do I/O on that file descriptor while doing
    the "something else", or you are in for a surprise.

    getline(3) returns an allocated buffer for the user to free(3).
    That's an easy to understand operation mode with clearly
    assigned responsibilies, and it is so obvious that the user
    can alter the buffer - after all, it's their own buffer -
    that the manual doesn't even need to mention it.

Your code follows the CAVEATS section in the fgetln(3) manual quite
closely, and i would have been quite surprised if you had screwed
such a thing up.

But with most other people, i would have felt somewhat worried
about their code if they had asked me "safer how?" *after* using
fgetln(3)...  >:-o

> I don't mind too much, but you will also need to initialize size
> to 0 (it is currently uninitialized) before calling getline().

Ouch.  No interface can be so simple that there is no way to
misuse it dangerously, even if it sets fewer traps for the
unwary than fgetln(3).

>> Index: magic-load.c
>> ===================================================================
>> RCS file: /cvs/src/usr.bin/file/magic-load.c,v
>> retrieving revision 1.25
>> diff -u -p -r1.25 magic-load.c
>> --- magic-load.c     1 Jul 2017 14:34:29 -0000       1.25
>> +++ magic-load.c     1 Jul 2017 18:44:42 -0000
>> @@ -1071,6 +1071,7 @@ magic_load(FILE *f, const char *path, in
>>      struct magic_line       *ml = NULL, *parent, *parent0;
>>      char                    *line, *tmp;
>>      size_t                   size;
>> +    ssize_t                  slen;
>>      u_int                    at, level, n, i;
>>  
>>      m = xcalloc(1, sizeof *m);
>> @@ -1084,15 +1085,11 @@ magic_load(FILE *f, const char *path, in
>>  
>>      at = 0;
>>      tmp = NULL;
>> -    while ((line = fgetln(f, &size))) {

Hmmm, the current code does not test ferror(3) after falling out
of the "while" loop - is it intentional to ignore I/O errors?
In that case, an explicit comment "ignore I/O errors" after
the end of the while loop might make sense.

>> -            if (line[size - 1] == '\n')
>> -                    line[size - 1] = '\0';
>> -            else {
>> -                    tmp = xmalloc(size + 1);
>> -                    memcpy(tmp, line, size);
>> -                    tmp[size] = '\0';
>> -                    line = tmp;
>> -            }
>> +    while ((slen = getline(&tmp, &size, f)) != -1) {
>> +            line = tmp;

I don't see a need for two char * variables here, that's an artifact
of the typical fgetln(3) complications.  Just

        char    *line = NULL;
        size_t   linesize = 0;
        ssize_t  slen;

        while ((slen = getline(&line, &linesize, f)) != -1) {
                if (line[slen - 1] == '\n')
                        /* etc. */
        }
        free(line);
        if (ferror(f))
                err(1, "%s", path);

ought to suffice.

Yours,
  Ingo

>> +            if (line[slen - 1] == '\n')
>> +                    line[slen - 1] = '\0';
>> +
>>              at++;
>>  
>>              while (isspace((u_char)*line))

Reply via email to