When you group by IP address in step 1 to this:

        (ip1,(lat1,lon1),(lat2,lon2))
        (ip2,(lat3,lon3),(lat4,lat5))

How many lat/lon locations do you expect for each IP address?  avg and max
are interesting.

Andrew


On Wed, Jun 4, 2014 at 5:29 AM, Oleg Proudnikov <oleg.proudni...@gmail.com>
wrote:

>  It is possible if you use a cartesian product to produce all possible
> pairs for each IP address and 2 stages of map-reduce:
>  - first by pairs of points to find the total of each pair and
> -  second by IP address to find the pair for each IP address with the
> maximum count.
>
> Oleg
>
>
>
> On 4 June 2014 11:49, lmk <lakshmi.muralikrish...@gmail.com> wrote:
>
>> Hi,
>> I am a new spark user. Pls let me know how to handle the following
>> scenario:
>>
>> I have a data set with the following fields:
>> 1. DeviceId
>> 2. latitude
>> 3. longitude
>> 4. ip address
>> 5. Datetime
>> 6. Mobile application name
>>
>> With the above data, I would like to perform the following steps:
>> 1. Collect all lat and lon for each ipaddress
>>         (ip1,(lat1,lon1),(lat2,lon2))
>>         (ip2,(lat3,lon3),(lat4,lat5))
>> 2. For each IP,
>>         1.Find the distance between each lat and lon coordinate pair and
>> all
>> the other pairs under the same IP
>>         2.Select those coordinates whose distances fall under a specific
>> threshold (say 100m)
>>         3.Find the coordinate pair with the maximum occurrences
>>
>> In this case, how can I iterate and compare each coordinate pair with all
>> the other pairs?
>> Can this be done in a distributed manner, as this data set is going to
>> have
>> a few million records?
>> Can we do this in map/reduce commands?
>>
>> Thanks.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-done-in-map-reduce-technique-in-parallel-tp6905.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
>
> --
> Kind regards,
>
> Oleg
>
>

Reply via email to