hi andy, > For what it's worth, the fuzzyocr hashing is of very limited value, and in > many cases is a severe performance hit. I found that scanning the hashes, > due to the "fuzzy" nature, is more costly than just rescanning the file > with OCR, as *each* *and* *every* hash must be checked iteratively.
now, *that's* an interesting point to consider. i'd be interested in what, then, the 'goal' of the hashing/comparison *is*? is it performance, and it just missed the mark for the reasons you state? or is it something else? dunno. but, your point bears some benchmarking ... thx!