I wrote an input format for Redshift's tables unloaded UNLOAD the ESCAPE option: https://github.com/mengxr/redshift-input-format , which can recognize multi-line records.
Redshift puts a backslash before any in-record `\\`, `\r`, `\n`, and the delimiter character. You can apply the same escaping before calling saveAsTextFIle, then use the input format to load them back. Xiangrui On Fri, Sep 12, 2014 at 7:43 PM, Mohit Jaggi <mohitja...@gmail.com> wrote: > Folks, > I think this might be due to the default TextInputFormat in Hadoop. Any > pointers to solutions much appreciated. >>> > More powerfully, you can define your own InputFormat implementations to > format the input to your programs however you want. For example, the default > TextInputFormat reads lines of text files. The key it emits for each record > is the byte offset of the line read (as a LongWritable), and the value is > the contents of the line up to the terminating '\n' character (as a Text > object). If you have multi-line records each separated by a $character, you > could write your own InputFormat that parses files into records split on > this character instead. >>> > > Thanks, > Mohit --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org