One work around worth trying is to use the "--hive-drop-import-delims" option and do a hive import. With this option set, Sqoop will remove any new lines or ^A characters which are the default delimiters used for Hive. After the import is done, you could copy the file out of Hive directly and use it in your application.
Arvind On Fri, Oct 21, 2011 at 7:05 AM, Mark Roddy <[email protected]> wrote: > I used "--escaped-by \\" due to bash, so that "\" would be the escape > character used. That works fine, I end up with \n and \t characters > escaped by '\'. > > > To put the problem more concretely, I have a singe record from the db > with a field containing the following value: > "foo > bar baz > biz" > > Sqoop will spit out: > "foo\ > bar baz\ > biz" > > > No if I run a map reduce job on this with the TextInputFormat, the > record will be terminated after "foo" not after "biz". I did a little > digging and TextInputFormat uses LineRecordReader, which uses > LineReader which looking at the source, clearly does not honor the > escape char. Is there a tool/input format/etc that will read from > HDFS and honor this? It does not seem that M/R can do it out of the > box. I can't find a way to get Pig. I assume there must be something > that will honor the escape, but can not find anything. > > > > On Fri, Oct 21, 2011 at 5:26 AM, Alexander C.H. Lorenz > <[email protected]> wrote: >> Hi Mark, >> --escaped-by \/ (backslash - slash) tells bash to escape the next character. >> (if I understood you right) >> - Alex >> On Fri, Oct 21, 2011 at 12:12 AM, Mark Roddy <[email protected]> wrote: >>> >>> I'm moving free form data out of a RDBMS that has a lot of \n, \r\n, >>> and \t characters. >>> >>> I used "--escaped-by \\" (extra \ cause of bash), but I'm a little >>> confused about what to do with this data now. I can't seem to find >>> any tools that will honor the '\' escape char. TextInputFormat does >>> not seem to. >>> >>> I'm working on replacing an existing in house tool w/sqoop that >>> replace newlines with the literal string '\n'. I'd be happy to do as >>> such but I don't see any way of doing so. >>> >>> I'm sure I'm not the first person to run into this so I appreciate any >>> suggestions. >>> >>> -Mark >> >> >> >> -- >> Alexander Lorenz >> http://mapredit.blogspot.com >> >> >
