I've got my specialized match-finder working (minimize cache misses
while processing blocks of size 2**17.)  Now I find that it is slow,
mainly because it finds *all* matches, even those that "obviously"
are not good candidates for encoding.

It seems to me that there are two missing parameters to the match finder:

1) the current four offsets which have very low encoding cost (many bits
   less than other nearby offsets)

2) for each of the four match lengths 2,3,4, and 5: the maximum offset
   that yields a savings when encoded, but ignoring the possibility of
   special savings due to using one of the four most-recent offsets.
   For instance, gzip won't even consider any offset greater than 4096
   ("TOO_FAR") for gzip's minimum match length of 3.

If the match finder knows that info, then the match finder can avoid
or suppress "obvious" bad candidates.

Comments?

-- 


Reply via email to