Lewis Tian noted on the mailing list that an application reading with fscanf() from a large file fails after passing one BUFSIZ of data.
It turns out that the old Musl code we were using was only correct assuming a certain bizarre readv() bug workaround which OSv dropped in commit 0b651428b91663255d8da9a4913663d0cd4cc710. But the Musl developers have too noticed they can't rely on this workaround (which doesn't even apply to unbuffered streams), and came up with a fix in their commit https://git.musl-libc.org/cgit/musl/commit/src/internal/shgetc.c?id=c20804500deebaabc56f383d48dd1ac77dce8349 This patch does the same fix in our copy of the Musl code. By the way, after this change libc/internal/shgetc.c is now identical to musl/src/internal/shgetc.c but we can't drop the former yet because of some header file differences that still need to be cleared up. Here is the original Musl commit message describing the change: "fix major scanf breakage with unbuffered streams, fmemopen, etc. the shgetc api, used internally in scanf and int/float scanning code to handle field width limiting and pushback, was designed assuming that pushback could be achieved via a simple decrement on the file buffer pointer. this only worked by chance for regular FILE streams, due to the linux readv bug workaround in __stdio_read which moves the last requested byte through the buffer rather than directly back to the caller. for unbuffered streams and streams not using __stdio_read but some other underlying read function, the first character read could be completely lost, and replaced by whatever junk happened to be in the unget buffer. to fix this, simply have shgetc, when it performs an underlying read operation on the stream, store the character read at the -1 offset from the read buffer pointer. this is valid even for unbuffered streams, as they have an unget buffer located just below the start of the zero-length buffer. the check to avoid storing the character when it is already there is to handle the possibility of read-only buffers. no application-exposed FILE types are allowed to use read-only buffers, but sscanf and strto* may use them internally when calling functions which use the shgetc api." Signed-off-by: Nadav Har'El <n...@scylladb.com> --- libc/internal/shgetc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/libc/internal/shgetc.c b/libc/internal/shgetc.c index 0641c835..5a47ac9e 100644 --- a/libc/internal/shgetc.c +++ b/libc/internal/shgetc.c @@ -22,5 +22,6 @@ int __shgetc(FILE *f) else f->shend = f->rend; if (f->rend) f->shcnt += f->rend - f->rpos + 1; + if (f->rpos[-1] != c) f->rpos[-1] = c; return c; } -- 2.21.1 -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/20200114111838.24313-1-nyh%40scylladb.com.