[Valgrind-users] Question: bug in user code, valgrind or glibc?

ISHIKAWA,chiaki Tue, 10 May 2022 20:11:48 -0700

Hi,

I have been analyzing thunderbird mail client under valgrind for sometime.
memcheck has been so useful for me to find memory-related errors.
Thank  you for releasing this great tool.

Recently, I noticed an invalid read of 8 bytes warning, which should befamiliar to all of us.

Interestingly, the initial part of the stack trace is found in a reportin Qt bug database.

It comes from dynamic loading library support.
https://bugreports.qt.io/browse/QTBUG-90374
It was filed last year.

My system is Debian GNU/Linux and I used gcc to compile thunderbird.
The report was done by someone who uses clang.

I believe the issue lies in a certain version of dl-library, glibc ORvalgrind? The reason I say valgrind might be to blame, too, is as follows.(Debian is known to release toolchains very conservatively. I think thatis why I did not see this issue last year.)

Actually, mine has line numbers slight off due to version differences Isuspect.


143:39.43 GECKO(115765) ==115769== Invalid read of size 8
143:39.64 GECKO(115765) ==115769==    at 0x4021BF4: strncmp (strcmp.S:175)
143:39.64 GECKO(115765) ==115769==    by 0x400655D: is_dst (dl-load.c:214)

143:39.64 GECKO(115765) ==115769== by 0x4007666: _dl_dst_count(dl-load.c:251)143:39.64 GECKO(115765) ==115769== by 0x4007857:expand_dynamic_string_token (dl-load.c:393)143:39.64 GECKO(115765) ==115769== by 0x40079C7: fillin_rpath.isra.0(dl-load.c:465)143:39.68 GECKO(115765) ==115769== by 0x4007CC2: decompose_rpath(dl-load.c:636)143:39.68 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath(dl-load.c:678)143:39.68 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath(dl-load.c:659)

      ... [omitted] ...

My local valgrind dump tells me where the address was allocated.

143:40.60 GECKO(115765) ==115769== Address 0x27ba3819 is 9 bytes insidea block of size 15 alloc'd143:40.65 GECKO(115765) ==115769== at 0x483CF9B: malloc(vg_replace_malloc.c:380)143:40.65 GECKO(115765) ==115769== by 0x402074B: malloc(rtld-malloc.h:56)

143:40.65 GECKO(115765) ==115769==    by 0x402074B: strdup (strdup.c:42)

143:40.65 GECKO(115765) ==115769== by 0x4007C54: decompose_rpath(dl-load.c:611)143:40.65 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath(dl-load.c:678)143:40.65 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath(dl-load.c:659)143:40.65 GECKO(115765) ==115769== by 0x4009E9D: _dl_map_object(dl-load.c:2174)

143:40.65 GECKO(115765) ==115769==    by 0x400E4B0: openaux (dl-deps.c:64)
          ... [omission] ...

I *think* this is a valid error case of large-sized READ used in strncmpreading beyond theallocated memory boundary. (strcmp.S shows 8 octets read instead of oneoctet at a time.)

I think such a usage of strdup/str{n}cmp combination is abound in Csource codes.

So I thought maybe valgrind was reporting something different.

Otherwise, many application programs have to create suppression for thistype of issue.

That is what I thought initially.

A different type of error I thought initially was, say, for example, 9 bytes
inside a block of size 15 might mean somehow the data contains
uninitialized data in the string area in that position.  However, come
to think of it, if so, strdup would have triggered a valgrind warning
before this.  There is no warning from valgrind for strdup.

Also, I created a test program and realized that in that case, valgrindprints


==120076== Conditional jump or move depends on uninitialised value(s)
==120076==    at 0x4843172: strncmp (vg_replace_strmem.c:663)
==120076==    by 0x108778: main (in /home/ishikawa/Dropbox/TB-DIR/a.out)

So the original problem must be the read beyond malloc'ed area boundary.

Now, is dl-library to blame?

I think dl-library has been used literally hundreds of million times ormore daily and

is hard to think that there is a bug there. (Famous last word).

Dl-library does not have control how long each path strings are (I
think it is trying to record the path components of a loading path),
and thus cannot control valgrind messages generated due to 8-char read
going beyond the malloced memory end. (So probably people have to
create suppression after all. If the particular version has this
issue.)

As for valgrind, can valgrind be somehow more intelligent in this
case?  Maybe creating a substitute strcmp? (I know single char
comparison at a time would be slower than comparing 8 characters at a
time when appropriate).  But at least, this type of surprise warning
would be reduced.

However, we may have a problem here for glibc..  If this read beyond
the malloced region is for real, we have a problem.  I have no idea how
this behavior is constrained or sanctioned by C standard, C library
standard or POSIX standard, but the use of 8 octets strcmp.S can lead
to a real issue possibly unless malloc() does allocate memory chunks

in 8 or larger unit uniformly. Unless glibc makes sure that there is aguard area between malloc area and the end of user virtual space.


I have an experience where a bitblt-like CPU instruction expected us to
create a bitmap with a horizontal bit length of multiple of 16 (or 8?).
even if the really used screen size is less than that. So we had to
round it up to the multiple of 16 (or 8?).

I got a bit stingy on memory use and once created a bitmap data with theraster linenot appended with this extra octets to make its length a multiple of 16(or 8).Kaboom. I created this memory area using the C runtime library of theCPU/computer maker's OS.When the CPU bitblt-like instruction accessed the last raster line data,it fetches data 16 (or 8?) octets at a time and at the end, it accessedbeyond the malloced area.

And it was BEYOND the allocated user memory space by the OS.

(The access of 8-byte read for intermediate ratlines ended up readingthe next allocated rasterline area, and so it was OK.)

So the program crashed due to memory violation.

It took me a couple of weeks to figure this out since bitblt-likeinstruction did not

offer any clue regarding where the address violation occurred.

Also, only one of the screen bitmaps created thusly was at the end ofuser virtual space and it was difficult to realize why the instructioncrashed seemingly in random manner when it handled other bitmaps withouta problem.The CPU vendor intended to use the instruction only for the main displayscreen of its work station and in that case, the memory is preallocatedin neat 16-multiple horizontal. I tried to use the bitblt-likeinstruction for arbitrary use-defined virtual screen.

So my message here is that there *can* be a grave consequence of thismalloc and reading larger than originally assumed chunk behavior, but Iam not sure where to report this and alert the developers.Yeah, if malloc() allocates 8 or 16 byte chunks always, it should be OK [and we are better off it is built this way due to some standard, glibcmanifest, or whatever published document which won't change overnight.]

Even in this age of PC users having GBs of memory, I hate to think ofprograms which allocates memory using 3 or 4 octet length...


Chiaki




_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

[Valgrind-users] Question: bug in user code, valgrind or glibc?

Reply via email to