Yes, One of my friend is implemeting the same. I know global sharing of Data is not possible across Hadoop MapReduce. But I need to check if that can be done somehow in hadoop Mapreduce also. Because I found some papers in KNN hadoop also. And I trying to compare the performance too.
Hope some pointers can help me. On Thu, Jan 15, 2015 at 12:17 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > > have you considered implementing using something like spark? That could > be much easier than raw map-reduce > > On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni <unmeshab...@gmail.com> > wrote: > >> In KNN like algorithm we need to load model Data into cache for >> predicting the records. >> >> Here is the example for KNN. >> >> >> [image: Inline image 1] >> >> So if the model will be a large file say1 or 2 GB we will be able to load >> them into Distributed cache. >> >> The one way is to split/partition the model Result into some files and >> perform the distance calculation for all records in that file and then find >> the min ditance and max occurance of classlabel and predict the outcome. >> >> How can we parttion the file and perform the operation on these partition >> ? >> >> ie 1 record <Distance> parttition1,partition2,.... >> 2nd record <Distance> parttition1,partition2,... >> >> This is what came to my thought. >> >> Is there any further way. >> >> Any pointers would help me. >> >> -- >> *Thanks & Regards * >> >> >> *Unmesha Sreeveni U.B* >> *Hadoop, Bigdata Developer* >> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* >> http://www.unmeshasreeveni.blogspot.in/ >> >> >> > -- *Thanks & Regards * *Unmesha Sreeveni U.B* *Hadoop, Bigdata Developer* *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* http://www.unmeshasreeveni.blogspot.in/