Glad it worked Mark!

> And it looks like you don't have to do a hive import to use it.

That sounds like a bug to me :)

Arvind

On Fri, Oct 21, 2011 at 9:41 AM, Mark Roddy <[email protected]> wrote:

> Thanks for the help Arvind.  The hive-drop-import-delims worked.  And
> it looks like you don't have to do a hive import to use it.
>
> -Mark
>
>
> On Fri, Oct 21, 2011 at 11:43 AM, Arvind Prabhakar <[email protected]>
> wrote:
> > One work around worth trying is to use the "--hive-drop-import-delims"
> > option and do a hive import. With this option set, Sqoop will remove
> > any new lines or ^A characters which are the default delimiters used
> > for Hive. After the import is done, you could copy the file out of
> > Hive directly and use it in your application.
> >
> > Arvind
> >
> > On Fri, Oct 21, 2011 at 7:05 AM, Mark Roddy <[email protected]> wrote:
> >> I used "--escaped-by \\" due to bash, so that "\" would be the escape
> >> character used.  That works fine, I end up with \n and \t characters
> >> escaped by '\'.
> >>
> >>
> >> To put the problem more concretely, I have a singe record from the db
> >> with a field containing the following value:
> >> "foo
> >> bar baz
> >> biz"
> >>
> >> Sqoop will spit out:
> >> "foo\
> >> bar baz\
> >> biz"
> >>
> >>
> >> No if I run a map reduce job on this with the TextInputFormat, the
> >> record will be terminated after "foo" not after "biz".  I did a little
> >> digging and TextInputFormat uses LineRecordReader, which uses
> >> LineReader which looking at the source, clearly does not honor the
> >> escape char.  Is there a tool/input format/etc that will read from
> >> HDFS and honor this?  It does not seem that M/R can do it out of the
> >> box.  I can't find a way to get Pig.  I assume there must be something
> >> that will honor the escape, but can not find anything.
> >>
> >>
> >>
> >> On Fri, Oct 21, 2011 at 5:26 AM, Alexander C.H. Lorenz
> >> <[email protected]> wrote:
> >>> Hi Mark,
> >>> --escaped-by \/ (backslash - slash) tells bash to escape the next
> character.
> >>> (if I understood you right)
> >>> - Alex
> >>> On Fri, Oct 21, 2011 at 12:12 AM, Mark Roddy <[email protected]>
> wrote:
> >>>>
> >>>> I'm moving free form data out of a RDBMS that has a lot of \n, \r\n,
> >>>> and \t characters.
> >>>>
> >>>> I used "--escaped-by \\" (extra \ cause of bash), but I'm a little
> >>>> confused about what to do with this data now.  I can't seem to find
> >>>> any tools that will honor the '\' escape char.  TextInputFormat does
> >>>> not seem to.
> >>>>
> >>>> I'm working on replacing an existing in house tool w/sqoop that
> >>>> replace newlines with the literal string '\n'.  I'd be happy to do as
> >>>> such but I don't see any way of doing so.
> >>>>
> >>>> I'm sure I'm not the first person to run into this so I appreciate any
> >>>> suggestions.
> >>>>
> >>>> -Mark
> >>>
> >>>
> >>>
> >>> --
> >>> Alexander Lorenz
> >>> http://mapredit.blogspot.com
> >>>
> >>>
> >>
> >
>

Reply via email to