When I experimented with using an InputFormat I had used in Hadoop for a
long time in Hadoop I found
1) it must extend org.apache.hadoop.mapred.FileInputFormat (the deprecated
class not org.apache.hadoop.mapreduce.lib.input;FileInputFormat
2) initialize needs to be called in the constructor
3) The type - mine was extends FileInputFormat<Text, Text> must not be a
Hadoop Writable - those are not serializable but extends
FileInputFormat<StringBuffer, StringBuffer> does work - I don't think this
is allowed in Hadoop

Are these statements correct and if so it seems like most Hadoop
InputFormate - certainly the custom ones I create require serious
modifications to work - does anyone have samples of use of Hadoop
InputFormat

Since I am working with problems where a directory with multiple files are
processed and some files are many gigabytes in size with multiline complex
records an input format is a requirement.

Reply via email to