Re: How to partition a file to smaller size for performing KNN in hadoop mapreduce

Ted Dunning Wed, 14 Jan 2015 22:49:58 -0800

have you considered implementing using something like spark?  That could be
much easier than raw map-reduce


On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni <unmeshab...@gmail.com>
wrote:

> In KNN like algorithm we need to load model Data into cache for predicting
> the records.
>
> Here is the example for KNN.
>
>
> [image: Inline image 1]
>
> So if the model will be a large file say1 or 2 GB we will be able to load
> them into Distributed cache.
>
> The one way is to split/partition the model Result into some files and
> perform the distance calculation for all records in that file and then find
> the min ditance and max occurance of classlabel and predict the outcome.
>
> How can we parttion the file and perform the operation on these partition ?
>
> ie  1 record <Distance> parttition1,partition2,....
>      2nd record <Distance> parttition1,partition2,...
>
> This is what came to my thought.
>
> Is there any further way.
>
> Any pointers would help me.
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>

Re: How to partition a file to smaller size for performing KNN in hadoop mapreduce

Reply via email to