There are some techniques you can use If you geohash 
<http://en.wikipedia.org/wiki/Geohash> the lat-lngs.  They will naturally be 
sorted by proximity (with some edge cases so watch out).  If you go the join 
route, either by trimming the lat-lngs or geohashing them, you’re essentially 
grouping nearby locations into buckets — but you have to consider the borders 
of the buckets since the nearest location may actually be in an adjacent 
bucket.  Here’s a paper that discusses an implementation: 
http://www.gdeepak.com/thesisme/Finding%20Nearest%20Location%20with%20open%20box%20query.pdf
 
<http://www.gdeepak.com/thesisme/Finding%20Nearest%20Location%20with%20open%20box%20query.pdf>

> On Mar 9, 2015, at 11:42 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote:
> 
> Are you using SparkSQL for the join? In that case I'm not quiet sure you have 
> a lot of options to join on the nearest co-ordinate. If you are using the 
> normal Spark code (by creating key-pair on lat,lon) you can apply certain 
> logic like trimming the lat,lon etc. If you want more specific computing then 
> you are better off using haversine formula. 
> <http://www.movable-type.co.uk/scripts/latlong.html>

Reply via email to