Hi @tech, The bottom of this email contains a simplified strtonum implementation. It should behave exactly the same way as the existing strtonum, but it doesn't call strtoll, which means we can drop some unnecessary complexity: - No base conversion - No need to save and restore errno (which is a bigger deal than it sounds like because of the weird mutable array-of-structures that's used to recover errno in the existing strtonum.c) - No complicated sign-dependent "cutoff/cutlim" tests like the ones we see in strtoll.
That last point is most interesting. We're doing our 10*accumulator + d arithmetic in a 64-bit register, but we can report overflow as soon as we blow past 2^63 = -LLONG_MIN, which means that we have a bit of headroom in which we have some wiggle room when we try to determine whether 10*accumulator + d will cause an integer overflow in the register itself. Let "max = (-LLONG_MIN) / 10 + 1" be an unsigned compile-time constant, let "acc" be an unsigned accumulator, and let 0 <= d <= 9 be a digit. Then checking for overflow when computing 10*acc + d is easy because of the following two theorems: [1] If (acc <= max) then 10*acc + d <= ULLONG_MAX [2] If (acc > max) then 10*acc + d > -LLONG_MIN > LLONG_MAX Proof: [1] Suppose acc <= max. Then: 10*acc + d <= 10*max + d = 10*(floor(2^63/10) + 1) + d <= 10*(2^63/10 + 1) + d = 10*2^63/10 + 10 + d = 2^63 + 10 + d <= 2^63 + 19 <= ULLONG_MAX [2] Suppose acc > max. Then: -LLONG_MIN = 2^63 = 10*(2^63/10) < 10*(floor(2^63/10) + 1) = 10*max < 10*acc <= 10*acc + d The meaning of [1] is that if (acc <= max) then 10*acc + d will not overflow an unsigned buffer; the meaning of [2] is that if (acc > max) then you're guaranteed to overflow the buffer so there's no need to bother reading the digit. Thus we can loop through the digit-string "s" with the following loop: unsigned long long acc = 0; do { char c = *s++; if (!isdigit(c)) return ERROR_INVALID; else if (acc > max) return ERROR_OVERFLOW; acc = 10*acc + (c - '0'); } while (*++s); puts("acc did not overflow"); There are five possibilities: 1. If the string is empty, this returns ERROR_INVALID. 2. If the string has non-digit characters, this returns ERROR_INVALID. 3. If the number > ULLONG_MAX, this returns ERROR_OVERFLOW. 4. If the number <= -LLONG_MIN, this prints "acc did not overflow." 5. If -LLONG_MIN <= number <= ULLONG_MAX then the loop may either return ERROR_OVERFLOW or successfully print "acc did not overflow." The underlying number should count as having overflowed because neither it nor its negative can fit in a long long, but we will handle this case in the next step. The next thing we do is correct the sign. If "neg" is a boolean which tells whether a minus sign preceded the digit string, the following code will jam our unsigned quantity "acc" into the signed variable "ll" and test for overflow. Note that -LLONG_MIN == LLONG_MIN but that this code handles that case correctly whether neg is true or false: unsigned long long acc = <from above> long long ll; if (neg) { ll = -acc; if (ll > 0) return ERROR_TOOSMALL; } else { ll = acc; if (ll < 0) return ERROR_TOOLARGE; } puts("ll did not overflow!"); If we're still here, then ll has not been truncated, so we can compare against the user-defined minval and maxval: if (ll < minval) return ERROR_TOOSMALL; else if (ll > maxval) return ERROR_TOOLARGE; else return SUCCESS; That's basically it. I'll mention that similar logic can be used in strtol and strtoll. If we redo the proofs of [1] and [2] with an arbitrary base in place of the number 10 we find that the proof of property [2] doesn't change, and the only questionable part of the proof of property [1] is the last two steps, which rely on the assumption that: -INT_MIN + base + d <= UINT_MAX We know that d < base, and on a two's complement machine (which I believe is required by POSIX) this will hold whenever: -INT_MIN + base + d <= UINT_MAX -INT_MIN + base + (base-1) <= UINT_MAX -INT_MIN + 2*base - 1 <= UINT_MAX 2*base <= (UINT_MAX + 1) + INT_MIN 2*base <= 2^w - 2^(w-1) 2*base <= 2^(w-1) base <= 2^(w-2) The strto* family of functions support only base <= 36, and the C standard requires a word size of w >= 16 for integers, so we have plenty of space between -INT_MIN and UINT_MAX: base <= 36 <= 16384 = 2^(16 - 2) <= 2^(w-2) That said, because this trick makes use of the maybe-overflow-maybe-not values between -INT_MIN and UINT_MAX, it cannot be applied to the unsigned variants strtoul and strtoull. A complete implementation is provided below, along with a test harness to catch issues around the critical boundaries of INT_MIN and INT_MAX. I also added some assertions to the main loop to prove dynamically that the computation of 10*acc + d doesn't overflow; obviously you would remove these once you're comfortable that the approach is valid. Regards, Anthony Coulter === strtonum.c === #include <assert.h> #include <ctype.h> #include <errno.h> #include <limits.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> long long strtonum2(const char *s, long long minval, long long maxval, const char **errstrp) { enum { INVALID = 0, TOOSMALL = 1, TOOLARGE = 2 }; static const struct errval { const char *errstr; int err; } ev[3] = { { "invalid", EINVAL }, { "too small", ERANGE }, { "too large", ERANGE }, }; int error; if (minval > maxval) { error = INVALID; goto error; } while (isspace(*s)) s++; bool neg; if (*s == '-') { neg = true; s++; } else { neg = false; if (*s == '+') s++; } // The cast to (unsigned long long) avoids undefined behavior // corresponding to the signed integer overflow that takes place // when computing -LLONG_MIN. Strictly speaking, once we cast it // there's no need to negate the result, but I include the minus // sign for clarity and also because it's evaluated at compile // time. unsigned long long max = (-(unsigned long long)LLONG_MIN) / 10 + 1; unsigned long long acc = 0; do { if (!isdigit(*s)) { error = INVALID; goto error; } else if (acc > max) { error = neg ? TOOSMALL : TOOLARGE; goto error; } /*********************************/ // // Let's make sure we don't overflow the accumulator. // // An imperfect test: assert(acc <= 10*acc + (*s - '0')); // A perfect test: unsigned long long product, dummy; assert(!__builtin_umulll_overflow(10, acc, &product)); assert(!__builtin_uaddll_overflow(product, *s, &dummy)); /*********************************/ acc = 10*acc + (*s - '0'); } while (*++s != '\0'); long long ll; if (neg) { ll = -acc; if (ll > 0) { error = TOOSMALL; goto error; } } else { ll = acc; if (ll < 0) { error = TOOLARGE; goto error; } } if (ll < minval) { error = TOOSMALL; goto error; } else if (ll > maxval) { error = TOOLARGE; goto error; } if (errstrp != NULL) *errstrp = NULL; return ll; error: errno = ev[error].err; if (errstrp != NULL) *errstrp = ev[error].errstr; return 0; } static bool test(const char *s, long long minval, long long maxval) { const char *err1, *err2; int errno1, errno2; errno = 1234; long long a = strtonum(s, minval, maxval, &err1); errno1 = errno; errno = 1234; long long b = strtonum2(s, minval, maxval, &err2); errno2 = errno; // The strings might be NULL bool stringsmatch = (!err1 && !err2 || err1 && err2 && strcmp(err1, err2) == 0); if (a != b || errno1 != errno2 || !stringsmatch) { printf("%d%d%d\n", a == b, errno1==errno2, stringsmatch); printf("Them: %lld/%d/%s\n", a, errno1, err1 ? err1 : "(null)"); printf("Me : %lld/%d/%s\n", b, errno2, err2 ? err2 : "(null)"); } return a == b && errno1 == errno2 && stringsmatch; } int main(int argc, char *argv[]) { static const char *nums[] = { "", // Oops, almost forgot to test the empty string "0", "1", "00000000000000000000000000123", " -789", " \t\n+666", "-+12", "+-34", "9223372036854775797@", "9223372036854775798", // 2^63 - 10 "9223372036854775799", "9223372036854775800", "9223372036854775801", "9223372036854775802", "9223372036854775803", "9223372036854775804", "9223372036854775805", "9223372036854775806", "9223372036854775807", "9223372036854775808", // 2^63 "9223372036854775809", "9223372036854775810", "9223372036854775811", "9223372036854775812", "9223372036854775813", "9223372036854775814", "9223372036854775815", "9223372036854775816", "9223372036854775817", "9223372036854775818", // 2^63 + 10 "9223372036854775819@", "-9223372036854775797@", "-9223372036854775798", // -(2^63 - 10) "-9223372036854775799", "-9223372036854775800", "-9223372036854775801", "-9223372036854775802", "-9223372036854775803", "-9223372036854775804", "-9223372036854775805", "-9223372036854775806", "-9223372036854775807", "-9223372036854775808", // -2^63 "-9223372036854775809", "-9223372036854775810", "-9223372036854775811", "-9223372036854775812", "-9223372036854775813", "-9223372036854775814", "-9223372036854775815", "-9223372036854775816", "-9223372036854775817", "-9223372036854775818", // -(2^63 + 10) "-9223372036854775819@", // Let's also test numbers around 2^64. "18446744073709551606", // 2^64 - 10 "18446744073709551607", "18446744073709551608", "18446744073709551609", "18446744073709551610", "18446744073709551611", "18446744073709551612", "18446744073709551613", "18446744073709551614", "18446744073709551615", "18446744073709551616", // 2^64 "18446744073709551617", "18446744073709551618", "18446744073709551619", "18446744073709551620", "18446744073709551621", "18446744073709551622", "18446744073709551623", "18446744073709551624", "18446744073709551625", "18446744073709551626", // 2^64 + 10 "18446744073709551615", }; unsigned long long bounds[] = { LLONG_MIN, LLONG_MIN+3, LLONG_MIN+7, -5, 0, 12, LLONG_MAX-8, LLONG_MAX-2, LLONG_MAX, }; for (size_t i = 0; i < sizeof nums/sizeof nums[0]; i++) for (size_t j = 0; j < sizeof bounds/sizeof bounds[0]; j++) for (size_t k = 0; k < sizeof bounds/sizeof bounds[0]; k++) { if (!test(nums[i], bounds[j], bounds[k])) printf("Test failure for %s [%lld, %lld]\n", nums[i], bounds[j], bounds[k]); } return 0; }