Re: How to partition a file to smaller size for performing KNN in hadoop mapreduce

unmesha sreeveni Wed, 14 Jan 2015 23:06:03 -0800

Yes, One of my friend is implemeting the same. I know global sharing of
Data is not possible across Hadoop MapReduce. But I need to check if that
can be done somehow in hadoop Mapreduce also. Because I found some papers
in KNN hadoop also.
And I trying to compare the performance too.


Hope some pointers can help me.


On Thu, Jan 15, 2015 at 12:17 PM, Ted Dunning <[email protected]> wrote:

>
> have you considered implementing using something like spark?  That could
> be much easier than raw map-reduce
>
> On Wed, Jan 14, 2015 at 10:06 PM, unmesha sreeveni <[email protected]>
> wrote:
>
>> In KNN like algorithm we need to load model Data into cache for
>> predicting the records.
>>
>> Here is the example for KNN.
>>
>>
>> [image: Inline image 1]
>>
>> So if the model will be a large file say1 or 2 GB we will be able to load
>> them into Distributed cache.
>>
>> The one way is to split/partition the model Result into some files and
>> perform the distance calculation for all records in that file and then find
>> the min ditance and max occurance of classlabel and predict the outcome.
>>
>> How can we parttion the file and perform the operation on these partition
>> ?
>>
>> ie  1 record <Distance> parttition1,partition2,....
>>      2nd record <Distance> parttition1,partition2,...
>>
>> This is what came to my thought.
>>
>> Is there any further way.
>>
>> Any pointers would help me.
>>
>> --
>> *Thanks & Regards *
>>
>>
>> *Unmesha Sreeveni U.B*
>> *Hadoop, Bigdata Developer*
>> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
>> http://www.unmeshasreeveni.blogspot.in/
>>
>>
>>
>


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Re: How to partition a file to smaller size for performing KNN in hadoop mapreduce

Reply via email to