When I experimented with using an InputFormat I had used in Hadoop for a long time in Hadoop I found 1) it must extend org.apache.hadoop.mapred.FileInputFormat (the deprecated class not org.apache.hadoop.mapreduce.lib.input;FileInputFormat 2) initialize needs to be called in the constructor 3) The type - mine was extends FileInputFormat<Text, Text> must not be a Hadoop Writable - those are not serializable but extends FileInputFormat<StringBuffer, StringBuffer> does work - I don't think this is allowed in Hadoop
Are these statements correct and if so it seems like most Hadoop InputFormate - certainly the custom ones I create require serious modifications to work - does anyone have samples of use of Hadoop InputFormat Since I am working with problems where a directory with multiple files are processed and some files are many gigabytes in size with multiline complex records an input format is a requirement.