ignoring map task failure

2014-08-18 Thread parnab kumar
Hi All, I am running a job where there are between 1300-1400 map tasks. Some map task fails due to some error. When 4 such maps fail the job naturally gets killed. How to ignore the failed tasks and go around executing the other map tasks. I am okay with loosing some data for the failed t

group similar items using pairwise similar items

2014-06-25 Thread parnab kumar
Hi, I have a set of items and a pairwise similar items. I want to group together items that are mutually similar. For ex : if *A B C D E F G* are the items I have the following pairwise similar items *A B* *A C* *B C * *D E * *C G* *E F* I want the output as *A B C G* *D E F* Can someone su

grouping similar items toegther

2014-06-20 Thread parnab kumar
Hi, I have a set of hashes. Each Hash is a 32 bit Long Integer. Two hashes are similar if their corresponding hamming distance is less than equal to 2. I need to group together hashes that are mutually similar to one another i.e in the output file in each line i should have mutually similar k

Splitting input file - increasing number of mappers

2013-07-06 Thread parnab kumar
Hi , I have an input file where each line is of the form : URLs whose number is within a threshold are considered similar. My task is to group together all similar urls. For this i wrote a *custom writable* where i implemented the threshold check in the *compareTo*meth

How to design the mapper and reducer for the following problem

2013-06-14 Thread parnab kumar
An input file where each line corresponds to a document .Each document is identfied by some fingerPrints .For example a line in the input file is of the following form : input: - DOCID1 HASH1 HASH2 HASH3 HASH4 DOCID2 HASH5 HASH3 HASH1 HASH4 The output of the mapreduce job

how to design the mapper and reducer for the below problem

2013-06-13 Thread parnab kumar
Consider a following input file of format : input File : 1 2 2 3 3 4 6 7 7 9 10 11 The output Should be as follows : 1 2 3 4 6 7 9 10 11

read lucene index in mapper

2013-06-11 Thread parnab kumar
Hi , I need to read an existing lucene index in a map.can someone point me to the right direction. Thanks, Parnab