> On 13 Dec 2025, at 13:34, Even Rouault <[email protected]> wrote: > > Hi Roger, > >> The reason these stand out is that there are an awful lot of static analysis >> warnings (over 400) relating to unnecessary (and potentially buggy) sign >> conversions, and most of these are due to the use of signed integers where >> unsigned would typically be used. > With which tool? The library component itself is clean with cppcheck, > Coverity Scan, CLang static analyzer and CodeQL as we use them in GDAL with a > vendored copy of libtiff.
clang-tidy and also just clang. I’ve been progressively adding more compiler warnings and looking at the scope required to fix each of them. They are actually already in the source tree as the extra warnings in cmake/CompilerChecks.cmake. In the CI builds we only enable "-Wall -Wextra -Werror” and these are by no means comprehensive. Adding in the extra warnings shows up a lot of basic problems including implicit narrowing conversions, cast-alignment problems etc. Not sure why the static analysers aren’t also picking these up unless we need to increase their stringency. I have also been exploring the use of AI tools like Claude Code the last few weeks. While I’m not yet sold on their abilities to write new code, I’ve found them quite useful for doing detailed code review and also systematic refactoring with in many cases perfect accuracy. I’ve used it to find and fix some long-standing defects which had been known and unfixable for 15 years, but it only took a few minutes to find and fix issues that human reviewers had never managed to get their heads around. So I’ll also be trying it out on libtiff and asking it to review portions of the codebase and identify defects which the static analysis would not pick up. It’s good at finding inconsistencies in the API and design which the static analysis would never notice. The other aspect to this is I’ve been looking at what changes would be needed to make the libtiff C source code compile with a C++ compiler, and to scope out what changes are required to make it compile cleanly with both. >> And for the standard memory allocation functions, these all use size_t, not >> a signed size type like ssize_t. >> >> The same consideration may also apply to other functions where we are using >> the type to represent sizes which can not be negative, where unsigned would >> be more natural to use and in many cases all of the calculations are done >> with unsigned inputs. >> >> I was wondering if anyone could provide some historical context for why the >> API is this way. And as a followup, if it would be something we could >> consider changing to make the API and the internal implementation safer and >> cleaner. > I've no historical context on that either. I'm wondering how practical a > change is at that point of history given that it impacts API as well. I > suspect that there might part of the code base where the signedness would be > relied on, typically for loops (e.g. "tmsize_t i = stride; for (; i < cc - 3; > i += 4)" at line 371 of tif_predict.c that would not run correctly on > unsigned type if cc < 3), and switching to unsigned could introduce bugs. > Not a strong opinion though, just a remark that the benefit vs risk at that > point of time isn't immediately obvious to me. There are almost certainly places where the signedness is needed, but of all the sites I’ve manually reviewed, the vast majority could have been made unsigned safely. Agreed that the risk of change likely isn’t worth it, but I would like to scope out exactly what would need changing to fix it properly. Kind regards, Roger _______________________________________________ Tiff mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/tiff
