Does anyone have experience with using Hadoop InputFormats?

Steve Lewis Tue, 23 Sep 2014 17:13:52 -0700

 When I experimented with using an InputFormat I had used in Hadoop for a
long time in Hadoop I found
1) it must extend org.apache.hadoop.mapred.FileInputFormat (the deprecated
class not org.apache.hadoop.mapreduce.lib.input;FileInputFormat
2) initialize needs to be called in the constructor
3) The type - mine was extends FileInputFormat<Text, Text> must not be a
Hadoop Writable - those are not serializable but extends
FileInputFormat<StringBuffer, StringBuffer> does work - I don't think this
is allowed in Hadoop


Are these statements correct and if so it seems like most Hadoop
InputFormate - certainly the custom ones I create require serious
modifications to work - does anyone have samples of use of Hadoop
InputFormat

Since I am working with problems where a directory with multiple files are
processed and some files are many gigabytes in size with multiline complex
records an input format is a requirement.

Does anyone have experience with using Hadoop InputFormats?

Reply via email to