Sent from my iPad
On 2014-9-24, at 上午8:13, Steve Lewis <lordjoe2...@gmail.com> wrote: > When I experimented with using an InputFormat I had used in Hadoop for a > long time in Hadoop I found > 1) it must extend org.apache.hadoop.mapred.FileInputFormat (the deprecated > class not org.apache.hadoop.mapreduce.lib.input;FileInputFormat > 2) initialize needs to be called in the constructor > 3) The type - mine was extends FileInputFormat<Text, Text> must not be a > Hadoop Writable - those are not serializable but extends > FileInputFormat<StringBuffer, StringBuffer> does work - I don't think this is > allowed in Hadoop > > Are these statements correct and if so it seems like most Hadoop InputFormate > - certainly the custom ones I create require serious modifications to work - > does anyone have samples of use of Hadoop InputFormat > > Since I am working with problems where a directory with multiple files are > processed and some files are many gigabytes in size with multiline complex > records an input format is a requirement. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org