Re: Unit Testing Mapper with new MR API

2009-12-01 Thread Aaron Kimball
MRUnit has had new API support for a while now (see MAPREDUCE-800). The new Driver classes are in org.apache.hadoop.mrunit.mapreduce. - Aaron On Tue, Dec 1, 2009 at 12:44 AM, Bernd Fondermann wrote: > Hi, > > Unit testing a mapper with the old mapred API is easy. > > However, for the new mapredu

RE: How to write a custom input format and record reader to read multiple lines of text from files

2009-12-01 Thread Kunal Gupta
I am extending the class FileInputFormat. This class is having an abstract method createRecordReader. I have implemented the method, but still running the program is giving me constructor errors. I tried passing FileInputFormat as my InputFormat class in the job configuration, and surely it gave m

RE: How to write a custom input format and record reader to read multiple lines of text from files

2009-12-01 Thread guillaume.viland
I've developed a version of a MultipleLineTextInputFormat for hadoop 0.19. I think it is not perfect but it works for my needs. I've attached the code, feel free to improve or use it. Do not hesitate to contact me if you improve the code. -Message d'origine- DeĀ : Kunal Gupta [mailto:k

Simple normalizing of data

2009-12-01 Thread Tim Robertson
Hi all, I am processing a large tab file to format it suitable for loading into a database with a predefined schema. I have a tab file with a column that I need to normalize out to another table and reference it with a foreign key from the original file. I would like to hear if my proposed proces

Re: How to write a custom input format and record reader to read multiple lines of text from files

2009-12-01 Thread Kunal Gupta
NLineInputFormat will help in splitting N lines of text for each Mapper, but it will still pass single line of text to each call to the Map function. I want N lines of text to be passed as 'value' to the Map function. By extending FileInputFormat and RecordReader classes i am concatinating N line

Unit Testing Mapper with new MR API

2009-12-01 Thread Bernd Fondermann
Hi, Unit testing a mapper with the old mapred API is easy. However, for the new mapreduce API, I struggle creating the Context object in an easy and elegant way. How do I best mock the Context for the mapper, as MRUnit does not seem to be ready to be used with the new API yet? Is there example

Re: How to write a custom input format and record reader to read multiple lines of text from files

2009-12-01 Thread Amogh Vasekar
Hi, The NLineInputFormat (o.a.h.mapreduce.lib.input) achieves more or less the same, and should help you guide writing custom input format :) Amogh On 12/1/09 11:47 AM, "Kunal Gupta" wrote: Can someone explain how to override the "FileInputFormat" and "RecordReader" in order to be able to rea