On 4/25/19 5:23 PM, enh wrote: > On Thu, Apr 25, 2019 at 3:03 PM Rob Landley <r...@landley.net> wrote: >> >> On 4/24/19 5:50 PM, enh wrote: >>> On Wed, Apr 24, 2019 at 3:40 PM Rob Landley <r...@landley.net> wrote: >>>> >>>> On 4/23/19 8:06 PM, enh wrote: >>>>> (sorry. busy.) >>>> >>>> Sigh. I'll add a config symbol you can switch on, with help text >>>> complaining >>>> about it. >>> >>> well, it's not necessarily grep i'm worried about. i think if all the >>> other toys are now going to be line-buffered, it doesn't make much >>> sense to single out grep like this. >> >> Neither do I.
P.S. Keep in mind I'm still trying to make "android workstations" a thing. There really should be end users for this stuff. >> That said, after writing up a point by point rebuttal to your message, I >> benchmarked the code (silly to argue about something we haven't tested the >> speed >> of) and found out that devuan's grep takes 8 seconds to do something mine >> takes >> 20 to do _before_ the output buffering changed. Which is unacceptable and >> made >> the argument we were having kinda moot. > > (one thing i was going to say is that line buffering has a few weird > side-effects. C11's 7.21.3.3 implies that you have to flush > line-buffered streams any time you read from an unbuffered or > line-buffered stream, for example. ...why? I believe you that it says it. What possible purpose could it serve? (Sending data to yourself through a pipe or fifo wrapped with a FILE *? That's pilot error... If I specify -c99 to the compiler will it _not_? No, because this is libc breakage...) > which would affect your original code.) That said, the flush gets us back to unbuffered so it's not going to be _worse_ than defaulting to unbuffered. To be honest, the do_lines() rewrite would redo the input based on filehandles, and redoing the _output_ based on filehandles is starting to sound less bad all the time with each new FILE * stupidity. (I initially avoided it because it seemed unnecessary, then started again because getline() went into posix-2008.) I really don't _want_ to micromanage this nonsense, neither with manual flushes sprinkled into who knows how many apps, nor with selecting per-app buffering mode without a sane default. I want it to just work and stay out of my way, and if it can't sanely buffer then the Linux pipe infrastructure _can_. This "tty and non-tty work different" is too clever and breaks less in a really visible way. >> So now I'm staring at do_lines() and seeing if I can get it to do large (at >> least 64k) block reads where it doesn't memcpy() the data out into individual >> lines but passes a pointer into the buffer where possible. (And this lets me >> replace get_line(fd) which has been on the todo list forever, although moving >> the all to do_lines() and a callback is probably the best way...) > > /me shudders having just fixed a bug in NetBSD's code to do this. but, > yeah, it's probably worth the cost. and we have ASan to keep us > honest. Wasn't my first choice either, but a potential >2x performance improvement's kind of a thing. And the proper thing to do is probably mmap() the whole file when I can, read at least megabyte chunks when I can't (using the poll() and read as much as won't block trick, anyway), and maybe fall back to the existing getline() loop on nommu. I saw a mailing list post ages ago about other grep implementations using a streaming regex search where it ignores newlines until it finds a match and then analyzes the match for line breaks, and I really don't want to change the regex plumbing. (And with multiple patterns iterating over a 20 megabyte of file thrashing the cache like that is likely to be way slower than processing a cache local working set...) Sigh. I'd hoped to get through the roadmap to 1.0 before worrying too much about performance tuning, but the project appears to have developed a userbase. :) Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net