This is autogenerated code, https://gitlab.com/tampe/guile-persist/-/blob/master/src/hash/macro.c
Not sure how well uint128_t works on my system, but the compiler should be able to make this code run really fast. Anyhow it is quick matters to change to uint64_t due to things not being hand coded. Can you see what I am doing? On Thu, Feb 24, 2022 at 1:07 AM Stefan Israelsson Tampe < stefan.ita...@gmail.com> wrote: > High all, > > I have taken the task to study a possible hash table implementation mostly > in scheme but still speedy. I am still experimenting with different > approaches. With managing code and iteration higher order functions in > scheme and lower order implemented in C. I can match current hash > implementations but with hash-fold and hash-for-each 10-20X faster and > better semantics as it is not going via C. > > I suppose the implementation is more memory intensive than the current > implementation. But much much safer and featurefull. > > So what are the features: > * goops based organisation > > * allows one to type a hash table, e.g. disallow the use of > different versions > of the assessors for the same hash. > > * surprisingly fast although we define goops methods for the accessors > > * A flag that indicates that (hash i) = i, for integers, which is used in > python > > * Possible to freeze has hash e.g. it will not grow or shrink, when > cleared only > all the values will be cleared. > > * A possibility to use (hash x (ash 1 60)) as a key, e.g. when checking if > a slot > is equal, we compare only the long hash. > > * A proper separation between general code that can be pushed to C for > speed and scheme for nice integration and power > > Code: > > scheme: > https://gitlab.com/tampe/guile-persist/-/blob/master/ice-9/hashish.scm > > C: > https://gitlab.com/tampe/guile-persist/-/blob/master/src/hash/resize.c > > C/Scheme integration > Each handle is of the form > (hash key val) > > So when we store the key value pair we also store the hash values which are > 60bit wide. This means that we can dispatch the search for an item using > only the hash value and many operations can be done in a general way that > can be pushed to C (if we want). So typically you first try to use C and > when it finds a match you need to do the final check in scheme using the > correct equal predicate. This means that we can achieve a very extensible > system that you can extend without losing much performance. These > primitives make the scheme code beautiful and well organized so we should > probably keep this semantic even if we decide to do everything in guile > scheme. > > Example, consider that we have designed a new hash table semantics. The > easiest is to supply a hash function and a equal predicate like so > > (use-modules (ice-9 hashish)) > (use-modules (oop goops)) > > (define (myhash k s) ...) > (define (myequal? x y) ...) > > (define-class <my-hash-map> (<stis-hash-map>)) > (stis-new-hash-type <my-hash-map> make-my-hash-table myhash myequal? > my-hash-ref my-hash-set! my-hash-remove!) > > Preferably when we do not need to know the type of the hashtable we can > either use the methods (slow) stis-hash-ref stis-hash-set! > stis-hash-remove! > > The fast way is to take advantage of the following construct, > (define H (make-my-hash-table)) > (stis-with-hash-accessors (H : ref set remove) > ... > (set k1 (ref k2 #f)) > ... > (remove k3) > ...) > > Both these methods are safe. > > There are some extra tools available. > > > stis-hash-resize! > One can resize a hashtable as one likes > > > stis-hash-define-world! > One can fill the hash table and then specify that this is the world. Then > If all long hashes that are stored are unique, no equal method will be used > for identity and the hashmap will be freezed > > > stis-hash-unworld-it > Unset the world property (not yet coded) > > > stis-hash-copy > Create a new hashtable (this is not yet coded) > > IMPLEMENTATION > The implementation is not yet defined, we need to test it and try > different versions. But One thing that one can notice are that by reducing > the > size if the hashtable one can speed up all operations. So the idea is to > store > a list of hash/index pairs in the hash and use varying sized cells for > this. The > smallest cell is 2 bytes and we can use more bytes for the index (size of > hash table) and fewer for the hash itself) this means that we could use a > size of 64 for the hash value and allow 1024 items in the hash table. > Similarly as the hash grows we will use 32bit cells and split the maximal > size of the hashtable between the hash value and the hash table size. The > next size is 64bit cells and the last one are 128 bit cells. There will be > a fallback hash table which is the usual a-list bin table But that one will > be smaller memory wise and not in the hot path. > > I am in the process of doing the implementation and will see if I can find > the optimal setting for all sizes which I can test. (basically to pin down > the strategy for each of the sizes 1,2,4,6,16,32,64,128,256, ... As the > size of the hash can only be in those sizes. > > Oh also, for small hashes the hash is essentially a single association > list and superfast. > > Maybe the dispatching overhead is too much, donough, perhaps trimming down > the features might be needed. > > I will see if we can find some use of special assembler operations, > especially for small hashtables. > > PRIMITIVES, > There are essentially three primitives > 1. search = find a matching hash, return index or handle or #f > possible starting from a known position > > 2. add = add a new element that we now is not included > in the hashtable > > 3. search-list = fast alist search in case of a small assoc list case > returns the key-val handle or #f possibly > restarts > from a known index. > > Possibly one may define a search-and-set-some primitive, e.g. one can > in hot cases not need to fhe the add afterwards and add is only used in the > slow path in a set operation. (For other operations it is in the fast path > though) > > Happy Hacking! > > Regards > Stefan > > > > >