have you considered implementing using something like spark? That could be much easier than raw map-reduce
On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni <unmeshab...@gmail.com> wrote: > In KNN like algorithm we need to load model Data into cache for predicting > the records. > > Here is the example for KNN. > > > [image: Inline image 1] > > So if the model will be a large file say1 or 2 GB we will be able to load > them into Distributed cache. > > The one way is to split/partition the model Result into some files and > perform the distance calculation for all records in that file and then find > the min ditance and max occurance of classlabel and predict the outcome. > > How can we parttion the file and perform the operation on these partition ? > > ie 1 record <Distance> parttition1,partition2,.... > 2nd record <Distance> parttition1,partition2,... > > This is what came to my thought. > > Is there any further way. > > Any pointers would help me. > > -- > *Thanks & Regards * > > > *Unmesha Sreeveni U.B* > *Hadoop, Bigdata Developer* > *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* > http://www.unmeshasreeveni.blogspot.in/ > > >