On 8/4/16, 1:54 PM, "bilbosax" <waspenc...@comcast.net> wrote:
>I'm going to read up about hashing, this sounds very very interesting. >But I >am curious, does it also work if you have to do more than just compare >strings? What if you have to read a guys shoe size and make sure it is >within 3 sizes of a predetermined value? Is this possible using hashing? > It is possible either via the hash function or via the bucket search. It depends a bit on the range of possible outcomes. For example, if I wanted to find folks who provided comments by city block, the hash function would sort all house numbers into buckets (assuming city blocks range 1-100, 101-200, etc. But that's because I know the boundaries (hundreds by house number). If you don't know the bucket boundaries are trying to find groups on the fly (so that If you start with 100 you find both 99, 101) then you might design the hash key to be sortable by those ranges. Then instead of walking the buckets with a simple "for each" you would first get all the keys via "for each", sort them, then scan the list of keys and look for groups. Even with doing all of that, 38,000 loops should be way better than 1.4 billion (and of course, you might still be able to break some of this up into threads). If you read [1] the related sections are "Finding Duplicate Records" and "Finding Similar Records". -Alex [1] https://en.wikipedia.org/wiki/Hash_function