Got permission to forward it to the list... -------- Forwarded Message -------- Subject: Re: tsort regrets Date: Thu, 2 Nov 2023 13:47:05 -0500 From: Rob Landley <r...@landley.net> To: Ray Gardner <rayg...@gmail.com>
On 11/1/23 10:57, Ray Gardner wrote: > Hi Rob, > > I hope you are feeling better today. Better is a relative term. So much coughing... > I apologize for sending the tsort implementation. As I said, I > couldn't help myself, I wanted to see how it would turn out. Tell me > you've never had that feeling about programming something. It had > nothing to do with you "didn't do it fast enough." I'm not annoyed at you, I'm annoyed at _me_. You did good work, I'm just not in a position to make proper design decisions about it. You pulled in hash table code from python, which (assuming there's no license entanglement which I THINK is fine here) in its original context has 200 lines of description of its hash algorithm choice and performance tuning, and I'm going "is this generic? Should it go in lib? Should it be used in the tar dev/ino->name mapping for hardlink detection?") This rewrite is in pursuit of performance I haven't got real world tests for. For tsort I was thinking maybe gluing together a bunch of different seq stride outputs? $ for i in {1..1000}; do X=$(seq 1 $i 10000); echo "$X"; (($(wc -l <<< "$X")&1)) && echo 10000; done | tsort And similar... Except just running the above to wc -l instead of tsort takes 3.6 seconds on my laptop (for reasons that are probably more of this stdio output buffer size nonsense. And none of that's toybox, it's all debian commands running there) and I don't like individual tests taking longer than a second because there's a LOT of them and they don't run in parallel because the phrase "nondeterministic test suite" is kill-it-with-fire territory... > I figured you have your plate plenty full to overflowing with your > todo list(s), the shell, the orange pi, etc., and you'd be happy to > have that implementation. But I should have tried to see things from > your point of view, that maybe you also wanted to try re-implementing > tsort. I do want the implementation. I started reading through yours yesterday (hence the hash table comment). I've largely left hash tables as a todo item because their corner case behavior is unpredictable with hash collisions and so on: I am not up for _tuning_ a hash algorithm or table size in the absence of real world data to fling through it. I was leaving all the performance questions until after I had at least the linux from scratch builds going through it, but the AOSP guys started giving feedback on performance in _their_ build, which is read-only from my perspective (can't personally reproduce this to examine further, but they say this is the problem so...) You did nothing wrong. I'm venting at _myself_. I said it was unfair when I wrote it, and thought about deleting it instead of posting but I'm trying to blog honestly. I still think even messing with yes is probably premature optimization, but the generic problem of output cache to batch writes is one of the classic big ones (as the maddog anecdote shows). It came back as an android pain point in grep, and got spread out from there but so far it's ONLY been seen in grep. (There was a pain point in sed but it was big s///g substitutions in giant strings, solved differently.) > Please set it aside, don't look at it, or delete it if you wish. Maybe > I'll put a standalone version on my site when I have time to write the > code to handle non-seekable input. It wants seekable input? I hadn't gotten that far... > My attempts to contribute to toybox have had a cool reception so far. > I hope I can do better next time. It's me, not you, for which I apologize. I've had a rough month health-wise (nothing serious, just darn lingering, I suspect I got covid again and for the past week definitely some kind of cold on top of it that won't go away) and it's been affecting my programming focus. I don't want to discourage you, and I definitely plan to use at least your tsort explanation if not the new implementation (haven't decided on the latter yet, wandered off into reading the python code it referenced yesterday, and then fixed something else)... > Ray Rob P.S. Do you mind if I post this reply to the list? _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net