I used "--escaped-by \\" due to bash, so that "\" would be the escape character used. That works fine, I end up with \n and \t characters escaped by '\'.
To put the problem more concretely, I have a singe record from the db with a field containing the following value: "foo bar baz biz" Sqoop will spit out: "foo\ bar baz\ biz" No if I run a map reduce job on this with the TextInputFormat, the record will be terminated after "foo" not after "biz". I did a little digging and TextInputFormat uses LineRecordReader, which uses LineReader which looking at the source, clearly does not honor the escape char. Is there a tool/input format/etc that will read from HDFS and honor this? It does not seem that M/R can do it out of the box. I can't find a way to get Pig. I assume there must be something that will honor the escape, but can not find anything. On Fri, Oct 21, 2011 at 5:26 AM, Alexander C.H. Lorenz <[email protected]> wrote: > Hi Mark, > --escaped-by \/ (backslash - slash) tells bash to escape the next character. > (if I understood you right) > - Alex > On Fri, Oct 21, 2011 at 12:12 AM, Mark Roddy <[email protected]> wrote: >> >> I'm moving free form data out of a RDBMS that has a lot of \n, \r\n, >> and \t characters. >> >> I used "--escaped-by \\" (extra \ cause of bash), but I'm a little >> confused about what to do with this data now. I can't seem to find >> any tools that will honor the '\' escape char. TextInputFormat does >> not seem to. >> >> I'm working on replacing an existing in house tool w/sqoop that >> replace newlines with the literal string '\n'. I'd be happy to do as >> such but I don't see any way of doing so. >> >> I'm sure I'm not the first person to run into this so I appreciate any >> suggestions. >> >> -Mark > > > > -- > Alexander Lorenz > http://mapredit.blogspot.com > >
