On 8/4/16, 1:54 PM, "bilbosax" <waspenc...@comcast.net> wrote:

>I'm going to read up about hashing, this sounds very very interesting.
>But I
>am curious, does it also work if you have to do more than just compare
>strings?  What if you have to read a guys shoe size and make sure it is
>within 3 sizes of a predetermined value? Is this possible using hashing?
>

It is possible either via the hash function or via the bucket search.  It
depends a bit on the range of possible outcomes. For example, if I wanted
to find folks who provided comments by city block, the hash function would
sort all house numbers into buckets (assuming city blocks range 1-100,
101-200, etc.  But that's because I know the boundaries (hundreds by house
number).  If you don't know the bucket boundaries are trying to find
groups on the fly (so that If you start with 100 you find both 99, 101)
then you might design the hash key to be sortable by those ranges.  Then
instead of walking the buckets with a simple "for each" you would first
get all the keys via "for each", sort them, then scan the list of keys and
look for groups.

Even with doing all of that, 38,000 loops should be way better than 1.4
billion (and of course, you might still be able to break some of this up
into threads).

If you read [1] the related sections are "Finding Duplicate Records" and
"Finding Similar Records".

-Alex

[1] https://en.wikipedia.org/wiki/Hash_function

Reply via email to