Following your example, how many "shapes" are there in the problem?
Couldn't you just store them separately first:

points["marbles"]=array(...)
points["squares"]=array(...)

so you don't have to loop through 38k records every time you have a marble
just to find all other marbles? Also, sorting your DB query by the shape
should help to improve the speed, so once the "shape[i]==shape[j]"
condition is met, all X following record will also meet that condition, and
after those X records, no one will.

You can do a first loop over the 38k records to prepare those "shape"
arrays, and then proceed with the big loop, but for each itemA, the inner
loop just iterates over all the items in points[itemA.shape], not over the
whole 38k records.

If all that sounds good and works, now thing if adding more dimensions to
the first array (as points[shape][color] for instance) would also help, add
as many dimensions as you wish, and there you have your hash function :) (a
dictionary is essentially a hash table).

On Sat, Aug 6, 2016 at 12:38 AM, bilbosax <waspenc...@comcast.net> wrote:

> I wish sometimes that we could actually talk because typing can become
> cumbersome when trying to convey ideas.  But basically imagine that I have
> a
> yellow marble laying on a map, and I want to know how many blue marbles lay
> within a mile of that marble.  I go through all the conditionals to make
> sure that it is a round blue marble, and if it is, I calculate the distance
> between the yellow and blue marble, and if it is less than a mile, I record
> the distance to a sparse array.  Because of your guy's suggestions, I now
> also add it to the sparse array in 2 places, because if I know the distance
> between the yellow and blue marble, I also know the distance from the blue
> to the yellow marble.  So now I am doing half the number of distance
> calculations, but have added the overhead of placing and getting
> information
> from a new array to be evaluated later.  None of the averages, medians, and
> other calculations that I need to do can be done at this time because we
> have made the sacrifice of increasing speed.
>
> Now I go to the next record, but this time it is a red square and it is
> looking for all of the green squares within a mile of it.  So I HAVE to go
> through all of the records again with respect to what the new record
> specifies.
>
> Now, once all distances have been claculated, I can go back through the
> sparse array and if a distance has been recorded, calculate the averages
> and
> sums for that particular record.
>
> So, yes, we could probably get it closer to a minute if distance
> calculations were all we had to do, but a lot of numbers have to be
> calculated, and they  have to be calculated with respect to the target
> record.  Just because we are cutting down the number of distance
> calculations does not change the fact that other numbers have to be
> calculated in addition to the distance for every record.
>
>
>
> --
> View this message in context: http://apache-flex-users.
> 2333346.n4.nabble.com/Workers-and-Speed-tp13098p13230.html
> Sent from the Apache Flex Users mailing list archive at Nabble.com.
>

Reply via email to