Re: Proposal for a new hash interface

Stefan Israelsson Tampe Sat, 26 Feb 2022 01:34:29 -0800

This is autogenerated code,

https://gitlab.com/tampe/guile-persist/-/blob/master/src/hash/macro.c


Not sure how well uint128_t works on my system, but the compiler should be
able to make this code run really fast. Anyhow it is quick matters to
change to uint64_t due to things not being hand coded. Can you see what I
am doing?

On Thu, Feb 24, 2022 at 1:07 AM Stefan Israelsson Tampe <
stefan.ita...@gmail.com> wrote:

> High all,
>
> I have taken the task to study a possible hash table implementation mostly
> in scheme but still speedy. I am still experimenting with different
> approaches. With managing code and iteration higher order functions in
> scheme and lower order implemented in C. I can match current hash
> implementations but with hash-fold and hash-for-each 10-20X faster and
> better semantics as it is not going via C.
>
> I suppose the implementation is more memory intensive than the current
> implementation. But much much safer and featurefull.
>
> So what are the features:
> * goops based organisation
>
> * allows one to type a hash table, e.g. disallow the use of
> different versions
>   of the assessors for the same hash.
>
> * surprisingly fast although we define goops methods for the accessors
>
> * A flag that indicates that (hash i) = i, for integers, which is used in
> python
>
> * Possible to freeze has hash e.g. it will not grow or shrink, when
> cleared only
>   all the values will be cleared.
>
> * A possibility to use (hash x (ash 1 60)) as a key, e.g. when checking if
> a slot
>   is equal, we compare only the long hash.
>
> * A proper separation between general code that can be pushed to C for
> speed    and scheme for nice integration and power
>
> Code:
>
> scheme:
> https://gitlab.com/tampe/guile-persist/-/blob/master/ice-9/hashish.scm
>
> C:
> https://gitlab.com/tampe/guile-persist/-/blob/master/src/hash/resize.c
>
> C/Scheme integration
> Each handle is of the form
> (hash key val)
>
> So when we store the key value pair we also store the hash values which are
> 60bit wide. This means that we can dispatch the search for an item using
> only the hash value and many operations can be done in a general way that
> can be pushed to C (if we want). So typically you first try to use C and
> when it finds a match you need to do the final check in scheme using the
> correct equal predicate. This means that we can achieve a very extensible
> system that you can extend without losing much performance. These
> primitives make the scheme code beautiful and well organized so we should
> probably keep this semantic even if we decide to do everything in guile
> scheme.
>
> Example, consider that we have designed a new hash table semantics. The
> easiest is to supply a hash function and a equal predicate like so
>
> (use-modules (ice-9 hashish))
> (use-modules (oop goops))
>
> (define (myhash    k s) ...)
> (define (myequal? x y) ...)
>
> (define-class <my-hash-map> (<stis-hash-map>))
> (stis-new-hash-type <my-hash-map> make-my-hash-table myhash myequal?
>      my-hash-ref  my-hash-set!  my-hash-remove!)
>
> Preferably when we do not need to know the type of the hashtable we can
> either use the methods (slow) stis-hash-ref stis-hash-set!
> stis-hash-remove!
>
> The fast way is to take advantage of the following construct,
> (define H (make-my-hash-table))
> (stis-with-hash-accessors (H : ref set remove)
>     ...
>     (set k1 (ref k2 #f))
>     ...
>     (remove k3)
>     ...)
>
> Both these methods are safe.
>
> There are some extra tools available.
>
> > stis-hash-resize!
> One can resize a hashtable as one likes
>
> > stis-hash-define-world!
> One can fill the hash table and then specify that this is the world. Then
> If all long hashes that are stored are unique, no equal method will be used
> for identity and the hashmap will be freezed
>
> > stis-hash-unworld-it
> Unset the world property (not yet coded)
>
> > stis-hash-copy
> Create a new hashtable (this is not yet coded)
>
> IMPLEMENTATION
> The implementation is not yet defined, we need to test it and try
> different versions. But One thing that one can notice are that by reducing
> the
> size if the hashtable one can speed up all operations. So the idea is to
> store
> a list of hash/index pairs in the hash and use varying sized cells for
> this. The
> smallest cell is 2 bytes and we can use more bytes for the index (size of
> hash table) and fewer for the hash itself) this means that we could use a
> size of 64 for the hash value and allow 1024 items in the hash table.
> Similarly as the hash  grows we will use 32bit cells and split the maximal
> size of the hashtable between the hash value and the hash table size. The
> next size is 64bit cells and the last one are 128 bit cells. There will be
> a fallback hash table which is the usual a-list bin table But that one will
> be smaller memory wise and not in the hot path.
>
> I am in the process of doing the implementation and will see if I can find
> the optimal setting for all sizes which I can test. (basically to pin down
> the strategy for each of the sizes 1,2,4,6,16,32,64,128,256, ... As the
> size of the hash can only be in those sizes.
>
> Oh also, for small hashes the hash is essentially a single association
> list and superfast.
>
> Maybe the dispatching overhead is too much, donough, perhaps trimming down
> the features might be needed.
>
> I will see if we can find some use of special assembler operations,
> especially for small hashtables.
>
> PRIMITIVES,
> There are essentially three primitives
> 1. search        =  find a matching hash, return index or handle or #f
>                            possible starting from a known position
>
> 2. add             =  add a new element that we now is not included
>                            in the hashtable
>
> 3. search-list   = fast alist search in case of a small assoc list case
>                           returns the key-val handle or #f possibly
> restarts
>                           from a known index.
>
> Possibly one may define a search-and-set-some primitive, e.g. one can
> in hot cases not need to fhe the add afterwards and add is only used in the
> slow path in a set operation. (For other operations it is in the fast path
> though)
>
> Happy Hacking!
>
> Regards
> Stefan
>
>
>
>
>

Re: Proposal for a new hash interface

Reply via email to