On Tue, Nov 7, 2023 at 11:19 AM Rob Landley <r...@landley.net> wrote: > > On 11/6/23 09:39, enh wrote: > >> P.S. Yes I checked if posix/libc just had one I could use, like with > >> qsort(), > >> but man 3 hash says "This page documents interfaces provided in glibc up > >> until > >> version 2.1. Since version 2.2, glibc no longer provides these interfaces. > > > > musl has the POSIX <search.h>, so i assume glibc still does too? > > That API has no walk() function to return all entries in the table, > hsearch(ENTER) returns null if the table is "full" (no resize so I'd have to > do > my own gc without access to a walk()
that's not true for the FreeBSD implementation that we reuse in bionic. nor for musl. > and only the non-posix _r extensions let > you have old and new tables in play at once so you pretty much have to keep a > linked linked list _and_ the table), can't pass a cleanup function to > destroy_r() to free the entries, for the tar.c use case I'd have to xmprintf() > the dev/ino pairs to produce the string "key" needed to store them in the > table > when some xor and shifts seem like the obvious hash function... > > It's a nice option to have, but didn't fit the use cases I'd compared it with. > But I should probably read Rich's implementation of it because he has a > history > of doing "the sane thing"... his string hash is while (*p) h = 31*h + *p++; > which looks fine. His stride is for (i = hash, j=1;; i += j++) which checks > hash&size, (hash+1)&size, (hash+1+2)&size, (hash+1+2+3)&size thus retaining > reasonable L1 cache locality without chaining into someone else's overflow > (very > nice). Yeah, this looks like it makes sense. Still no delete or walk. (Easy > way > to delete uses whiteouts, I tend to (void *)1 which _really_ seems to annoy > the > C++ compiler developers...) > > Huh, Rich _does_ resize() under the covers in __hsearch_r(). Pity I can't rely > on that happening in an arbitrary libc implementation... i suspect you probably can, unless glibc is broken? > *shrug* I _can_ do this but was trying to hold the optimization off until we'd > accumulated test loads inconvenienced by poor performance, so we can confirm > the > changes actually helped real world use. (Optimization in search of a load is a > variant of infrastructure in search of a user...) yeah, a "good enough" hash table is very little code anyway, and until/unless you're worried about (a) data center-sized workloads or (b) "this is a fundamental primitive in our language exported to arbitrary user code" (the python case), i don't think you need to worry about any of the things you've mentioned in this thread. (other than the possibility that glibc might actually require you to specify a _max_ size because it doesn't resize; but i assume musl would have kept the same behavior in that case?) > Lemme finish redoing crypt() first... > > Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net