I used "--escaped-by \\" due to bash, so that "\" would be the escape
character used.  That works fine, I end up with \n and \t characters
escaped by '\'.


To put the problem more concretely, I have a singe record from the db
with a field containing the following value:
"foo
bar baz
biz"

Sqoop will spit out:
"foo\
bar baz\
biz"


No if I run a map reduce job on this with the TextInputFormat, the
record will be terminated after "foo" not after "biz".  I did a little
digging and TextInputFormat uses LineRecordReader, which uses
LineReader which looking at the source, clearly does not honor the
escape char.  Is there a tool/input format/etc that will read from
HDFS and honor this?  It does not seem that M/R can do it out of the
box.  I can't find a way to get Pig.  I assume there must be something
that will honor the escape, but can not find anything.



On Fri, Oct 21, 2011 at 5:26 AM, Alexander C.H. Lorenz
<[email protected]> wrote:
> Hi Mark,
> --escaped-by \/ (backslash - slash) tells bash to escape the next character.
> (if I understood you right)
> - Alex
> On Fri, Oct 21, 2011 at 12:12 AM, Mark Roddy <[email protected]> wrote:
>>
>> I'm moving free form data out of a RDBMS that has a lot of \n, \r\n,
>> and \t characters.
>>
>> I used "--escaped-by \\" (extra \ cause of bash), but I'm a little
>> confused about what to do with this data now.  I can't seem to find
>> any tools that will honor the '\' escape char.  TextInputFormat does
>> not seem to.
>>
>> I'm working on replacing an existing in house tool w/sqoop that
>> replace newlines with the literal string '\n'.  I'd be happy to do as
>> such but I don't see any way of doing so.
>>
>> I'm sure I'm not the first person to run into this so I appreciate any
>> suggestions.
>>
>> -Mark
>
>
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
>

Reply via email to