Hi all,

I'm sure you've all read Ben Hoyt's blog post from earlier this week,
https://benhoyt.com/writings/count-words/

Of course (duh) I tried to implement a version in CHICKEN.  So did
Vasilij and Mario, and we all came to the conclusion that our performance
on that simple task is abysmal, we can't even outperform Python.

So I had a look and came up with one somewhat simple improvement to
srfi-69; every time you insert something into a hash table, it checks
if it needs to be resized.  This check involves a computation which is
needlessly heavy; it multiplies the min and max load factor with the
current length, calls "floor" and inexact->exact on the result, just to
compare it with the new size.

The attached patch moves these calculations to when the hash table is
created or resized, so that it can just read it out when checking for
resize.  It is a bit ugly because now we have 13 slots in the hash table
object, and keeping track of all the indexes becomes a bit cumbersome.
This could be changed in a refactor if necessary.

Then, I noticed that the resize check doesn't actually even resize it
when needed!  The argument passed to hash-table-resize! is "len", while
I think it should be "newsiz".  I've also fixed that in the patch.

This improves performance on my machine by getting the benchmark down
from 12 seconds to 4 seconds using Vasilij's code at http://ix.io/2Ths.
This puts us at least in the same ball park as Python, using the "simple"
version at least (which is comparable in code to Vasilij's).

Cheers,
Peter
Index: srfi-69.scm
===================================================================
--- srfi-69.scm	(revision 39571)
+++ srfi-69.scm	(working copy)
@@ -496,8 +496,10 @@
   (let ([make-vector make-vector])
     (lambda (test hash len min-load max-load weak-keys weak-values initial
                   #!optional (vec (make-vector len '())))
-      (let ((ht (##sys#make-structure 'hash-table
-                 vec 0 test hash min-load max-load #f #f initial #f)))
+      (let* ((min-load-len (inexact->exact (floor (* len min-load)))) ;; Cached values to speed up hash-table-check-resize!
+             (max-load-len (inexact->exact (floor (* len max-load))))
+             (ht (##sys#make-structure 'hash-table
+                                       vec 0 test hash min-load max-load #f #f initial #f min-load-len max-load-len)))
         (##sys#setslot ht 10 (*make-hash-function hash))
         ht) ) ) )
 
@@ -695,24 +697,28 @@
 ;; hash-table-resize!:
 
 (define (hash-table-resize! ht vec len)
-  (let* ([deslen (fxmin hash-table-max-length (fx* len hash-table-new-length-factor))]
-         [newlen (hash-table-canonical-length hash-table-prime-lengths deslen)]
-         [vec2 (make-vector newlen '())] )
+  (let* ((deslen (fxmin hash-table-max-length (fx* len hash-table-new-length-factor)))
+         (newlen (hash-table-canonical-length hash-table-prime-lengths deslen))
+         (min-load (##sys#slot ht 5))
+         (new-min-load-len (inexact->exact (floor (* len min-load))))
+         (max-load (##sys#slot ht 6))
+         (new-max-load-len (inexact->exact (floor (* len max-load))))
+         (vec2 (make-vector newlen '())) )
     (hash-table-rehash! vec vec2 (##sys#slot ht 10))
-    (##sys#setslot ht 1 vec2) ) )
+    (##sys#setslot ht 1 vec2)
+    (##sys#setslot ht 11 new-min-load-len)
+    (##sys#setslot ht 12 new-max-load-len)) )
 
 ;; hash-table-check-resize!:
 
 (define-inline (hash-table-check-resize! ht newsiz)
-  (let ([vec (##sys#slot ht 1)]
-        [min-load (##sys#slot ht 5)]
-        [max-load (##sys#slot ht 6)] )
-    (let ([len (##sys#size vec)] )
-      (let ([min-load-len (inexact->exact (floor (* len min-load)))]
-            [max-load-len (inexact->exact (floor (* len max-load)))] )
-        (if (and (fx< len hash-table-max-length)
-                 (fx<= min-load-len newsiz) (fx<= newsiz max-load-len))
-          (hash-table-resize! ht vec len) ) ) ) ) )
+  (let* ((vec (##sys#slot ht 1))
+         (min-load-len (##sys#slot ht 11))
+         (max-load-len (##sys#slot ht 12))
+         (len (##sys#size vec)))
+    (if (and (fx< len hash-table-max-length)
+             (fx<= min-load-len newsiz) (fx<= newsiz max-load-len))
+        (hash-table-resize! ht vec len) )   ) )
 
 ;; hash-table-copy:
 

Attachment: signature.asc
Description: PGP signature

Reply via email to