On 8/5/16, 6:55 PM, "bilbosax" <waspenc...@comcast.net> wrote:

>Alex Harui wrote
>> IMO, the new loop is constructed in a way that it will only test a vs b
>> and never b vs a, so there is no need to store things for the b vs a
>>test.
>
>Yes, but the point that I am trying to make is that I can only calculate
>the
>test A sums and averages against all the other records at this point.
>All I
>know about the Test B at this point is that it is a certain distance from
>Test A, but what about the distance between Test B and all of the other
>records and all the sums and averages that I want to keep for Test B?  I
>don't have them at the time.  So we are cutting the number of distance
>calculations in half, but have to go through them again so that the
>sums/averages for each and every record can be ascertained against all the
>other records. 

I think a key question is what these "sums and averages" are used for.  If
you must compute A against all other items in the database and then B
against all other items, then you simply have to do the work, although you
could store the results in A and B and look them up by computing which
record is holding the cached results.  Then you wouldn't need a sparse
array: the item with the lowest "index" holds the comparison based on the
new looping logic.

But it sounds like you only need to do the "sums and averages" only when
two items meet some criteria, not in the computation of which items meet
the criteria.  If that's true, then you first want to find the few pairs
items that need computation and then crank the data.  If you need some of
this math to determine which two items to compare, that would go into the
hash function.

>
>I definitely like the hash idea and want to learn more about it.  Do you
>have a book or any links that you recommend to learn a lot about hash
>functions?
>

I don't have any good resources.  This is stuff from my undergrad days
over 30 years ago.  It's been fun trying to recall it.  As you can see
from Wikipedia, mathematical functions can be used to process data into
groups for many useful purposes.  Having the right data and the right
functions is the key.  IMO, there are relatively few "new" problems these
days.  Most things have an analogy that has been solved before.  If you
can figure out a good analogous problem we can discuss it here without
messing up your company's IP.

-Alex

Reply via email to