On 10/19/23 13:20, enh wrote: >> > if you had >> > ``` >> > if (which->flags & TOYFLAG_LINEBUF) setvbuf(stdout, 0, _IOLBF, 0); >> > ``` >> > we'd argue a lot less about buffering :-) >> >> The existing LINEBUF users are ascii, base64, base32, yes, echo, grep, >> egrep, fgrep. > > yeah, i think we're in agreement about line buffering being reasonable for > grep. > > i'm just unconvinced that _no_ buffering is the right default...
I'm unconvinced there IS a right default. I was originally leaving it to libc to provide a default, but A) it was inconsistent between different libc's and even release versions, B) upgrades broke "less" on more than one occasion, not just for toybox but within distros I've used. (As in I've seen Red Hat break this and Ubuntu break this over the years, because they trusted libc's buffering to do something predictable.) I'm going to have to audit all the commands to categorize "produces progressive output" and "does not produce progressive output" (which is sadly kind of HARD because... wget? I mean... it COULD...) and then subset those into "writes to a file descriptor" (which doesn't let it off the hook) and "writes through stdio", and for each figure out what the performance looks like vs other implementations and whether I can/should easily improve it. Mostly, I wait for somebody to complain. But since the android build has already had more than one performance bottleneck I've already had to tweak toybox for, "performance" is bubbling up higher on my todo list. The gestalt of existing complaints has accumulated some momentum. >> Both ascii and echo pretty much want a single big output buffer, they do not >> produce progressive output so it might as well all be collated into one big >> output buffer and then flushed. >> >> I just rewrote yes not to use stdio at all, it does writev() to fd 1. > > ...not least because you end up doing stuff like this. Because the one _with_ the line buffering was two orders of magnitude slower than debian's, and writev() lets me multiply the output-per-syscall without allocating an unbounded amount of extra memory for copies. Right now 30% slower doesn't matter and 50% slower is only if somebody _specifically_ complains, but 10x slower? Or more? Harder to ignore. > i worry that this might cause regressions too (based on the commit > message from the change that made yes line-buffered): Debian's is doing 3g/s output on my laptop which means it's using large transaction sizes. Let's strace it: write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192 write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192) = 8192 Outputting more than one y\n per transaction can't break stuff or the debian one would have triggered it. > 1. > > (though i will accept the argument that "that's a toybox xargs test > bug", but (a) we'll still need to fix that test Any idea off the top of your head which of the 31 tests it was? (I can use the git commit time to go dig back through the web archive and my blog and such, but not right now. I added a TODO note to my tests/xargs.test file to remind me.) > and (b) if we're going > to go that way, did you measure what just using the trivial yes > implementation and default stdio buffering gets you?) $ cat yas.c #include <stdio.h> int main(int argc, char *argv[]) { for (;;) printf("y\n"); } $ gcc yas.c $ timeout 1 ./a.out | ./count -l >/dev/null It wrote 99m. The iovec one is writing 2.7G on the same hardware. Oddly enough your transaction size can be TOO big and slow down a little from the peak. I haven't looked at the kernel pipe implementation to see when it zerocopies provided buffers and when it copies them to chop them up, mostly because that really really really really should be the kernel guys' problem, and is almost guaranteed to be different in 5 years, and probably varies by hardware anyway depending on how expensive page table fiddling is and whether "can I just block the producer by not returning from the write() until the consumer's accepted the zerocopy data out of the pipe buffer" is an acceptable strategy or what... I am thus NOT attempting to calculate an optimal iovec[] length based on the provided command line string data, I'm just saying I was aware of the option. (But that way lies madness and gnu.) Rob P.S. Count's X/s time output is actually X/period and the period is 250ms, but it's also zeroing new periods and advancing into the zeroed period immediately in the calculation as a full period when we've only been there for less than a full 250ms, and what it SHOULD do is look at the remainder of millitime()%250 and use that to add a fraction of the oldest period (not otherwise included, and not included as a period in the divisor) so "however much data the new one accumulated in less than a full period of time" plus "the reciprocal portion of the old period we left" should equal a coherent count. P.P.S. Yes, I remember when this was 'while (0<(len = read(0, buf, sizeof(buf)))) { write(1, buf, len); dprintf(2, "%llu\n", total += len); }' or possibly yes.c I remember... _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net