You also need to pay attention to the split boundary, because you don’t want to split one line to different mappers. May be you can think about multi-line input format.
Simon. On Jul 6, 2013 10:18 AM, "Sanjay Subramanian" < sanjay.subraman...@wizecommerce.com> wrote: > More mappers will make it faster > U can try this parameter > mapreduce.input.fileinputformat.split.maxsize=<sizeinbytes> > This will control the input split size and force more mappers to run > > > Also ur usecase seems good candidate for defining a Combiner because u r > grouping keys based on a criteria > But only gotcha is Combiners are not guaranteed to be called to run > > Give these shot > > Good luck > > sanjay > > > > From: parnab kumar <parnab.2...@gmail.com> > Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org> > Date: Saturday, July 6, 2013 12:50 AM > To: "user@hadoop.apache.org" <user@hadoop.apache.org> > Subject: Splitting input file - increasing number of mappers > > Hi , > > I have an input file where each line is of the form : > > <URL> <A NUMBER> > > URLs whose number is within a threshold are considered similar. My > task is to group together all similar urls. For this i wrote a *custom > writable* where i implemented the threshold check in the > *compareTo*method.Therefore when Hadoop sorts the similar urls are grouped > together.This seems to work fine . > I have the following query : > > 1> Since i am relying more on the sort feature provided by Hadoop, am > i decreasing the efficiency in any way or using Hadoops sort feature which > hadoop does best i am actually doing the right thing.Now if this is the > right thing too , then it seems my job mostly relies on the map > task.Thefore will increase in the number of mappers increase efficiency ? > > 2> My file size is not more than 64 mb i.e a Hadoop block size > which means not more than 1 mapper will be invoked.Will splitting the file > into smaller size increase the efficiency by invoking more mappers. > > Can someone kindly provide some insight,advice regarding the above. > > Thanks , > Parnab, > MS student, IIT kharagpur > > CONFIDENTIALITY NOTICE > ====================== > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. >