On 20/01/2023 18:26, Geoff Canyon via use-livecode wrote:
I'm sure someone has done the work to create a more efficient algorithm for
this. Off the top of my head if I were trying to  I'd probably do something
like:

Hmmm. Maybe. But I kind of doubt it (though I'd love to find out I'm wrong).

This (or closely related problems) got a lot of attention on the mid70s-mid80s, when we were trying to do Design Rule Verification for VLSI circuits, with millions of transistors. Where the rules could exploit hierarchy, the "clever tree" data structures and algorithms did well (See Quad-CIF etc.) But for non-hierarchical problems (such as this 'closest points' case), nothing came close to this 'scanning window' or 'linear scan' approach.

But looking more closely, I realized that the number of times a new "closest pair" was found is remarkably small - typically between 6 & 10 times for 50,000 points. So it's very feasible to bear the cost of calculating the actual distance (i.e. doing the sqrt call) each time a new 'closest pair' is found, and that means the quick filtering test can be done on the x-distance (rather than x*x). Also, you can do a similar filter test on the Y coord (though it doesn't let you exit the inner loop, only allows you to skip the full comparison).

Adding those changes in gets the time down by another 10% or so - so the original 2000 points comes down from approx 35ms to around 28ms (almost too small to measure reliably). More reasonably, 50,000 points comes down from 880ms to 810ms.

Revised code:
function closestPointsSQ pLines
   sort pLines numeric by item 1 of each
   put pLines into pPoints
   split pPoints by CR
   put infinity into minDistSQ
   put infinity into minDist
   put the number of elements in pPoints into N
   repeat with i = 1 to N-1
      repeat with j = i + 1 to N
         put item 1 of pPoints[j] - item 1 of pPoints[i] into t1
         if t1 > minDist then exit repeat
         put item 2 of pPoints[j] - item 2 of pPoints[i] into t2
         if t2 > minDist OR t2 < -minDist then next repeat
         --         add 1 to compareCount
         put t1 * t1 + t2 * t2 into dist
         if dist < minDistSQ then
            put dist into minDistSQ
            put sqrt(dist) into minDist
            put pPoints[i] & " " & pPoints[j] into closestPoints
            --            add 1 to newClosest
         else if dist = minDistSQ then
            put sqrt(dist) into minDist
            put return & pPoints[i] & " " & pPoints[j] after closestPoints
            --            add 1 to tAlsoClosest
         end if
      end repeat
   end repeat
   --   put "SQ compared" && compareCount && newClosest && tAlsoClosest &CR after msg
   return closestPoints
end closestPointsSQ

Alex.

1. Grab two points at random (in case the points are pre-sorted in some
way) and get the distance.
2. Assume that's a reasonable average distance between points.
3. Define that as the number to beat, and define a grid based on it.
4. Go through all the points, tossing them into buckets based on the grid.
I'd define the buckets as fully overlapping to avoid missing close pairs.
5. The size of the grid buckets is critical: too big and you do too much
work. Too small and you end up with all singletons in the buckets. This
would require some experimentation and thought.
6. Go through the buckets only comparing the points within them.

On Fri, Jan 20, 2023 at 10:14 AM Alex Tweedly via use-livecode <
use-livecode@lists.runrev.com> wrote:

On 20/01/2023 16:52, Mark Waddingham via use-livecode wrote:
On 2023-01-20 13:05, Alex Tweedly via use-livecode wrote:
We need a better algorithm. If we use a "linear scan", we can change
it from essentially Order(N**2) to approx Order(N).
Slightly pedantic point (I appreciate that you did say 'approx')...

Sorting can not be done in any less time than O(N*log N) - so the
revised algorithm is O(N*log N) as you have to sort the input for it
to be able to scan linearly.

I figured someone would pick me up on that :-) It was a pretty big
'approx' :\)

The sorting step is indeed O(NlogN).

And the rest of it also worse than linear, but I don't have the math
knowledge to even begin to think about how much worse. Quick testing
says it's worse than O(NlogN), and in practice, it might even be as bad
as something like O(N * sqrt(N)), guessing about the number of points
within a (variable) sized window to the right of the current scan coord.

But in any "real app" usage that I can imagine, the 'clustering' effect
would seem to improve it over O(N*sqrt(N)); e.g. if the points tend to
cluster rather than be evenly randomly spread, then the number within
the scan window goes down on average. "real app" could be players in a
1st person player game, or vehicles on a road simulation, or (in 3d)
stars in the universe - in all cases causing the minDist to decrease
faster than when they are evenly spread.

Alex.



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to