+1

On Fri, Oct 21, 2011 at 11:03 AM, Mark Roddy <[email protected]> wrote:
> Seeing as I'm now depending on this behavior, I nominate that that bug
> be upgraded to feature :-)
>
> -Mark
>
> On Fri, Oct 21, 2011 at 1:38 PM, [email protected]
> <[email protected]> wrote:
>> Glad it worked Mark!
>>> And it looks like you don't have to do a hive import to use it.
>> That sounds like a bug to me :)
>> Arvind
>>
>> On Fri, Oct 21, 2011 at 9:41 AM, Mark Roddy <[email protected]> wrote:
>>>
>>> Thanks for the help Arvind.  The hive-drop-import-delims worked.  And
>>> it looks like you don't have to do a hive import to use it.
>>>
>>> -Mark
>>>
>>>
>>> On Fri, Oct 21, 2011 at 11:43 AM, Arvind Prabhakar <[email protected]>
>>> wrote:
>>> > One work around worth trying is to use the "--hive-drop-import-delims"
>>> > option and do a hive import. With this option set, Sqoop will remove
>>> > any new lines or ^A characters which are the default delimiters used
>>> > for Hive. After the import is done, you could copy the file out of
>>> > Hive directly and use it in your application.
>>> >
>>> > Arvind
>>> >
>>> > On Fri, Oct 21, 2011 at 7:05 AM, Mark Roddy <[email protected]> wrote:
>>> >> I used "--escaped-by \\" due to bash, so that "\" would be the escape
>>> >> character used.  That works fine, I end up with \n and \t characters
>>> >> escaped by '\'.
>>> >>
>>> >>
>>> >> To put the problem more concretely, I have a singe record from the db
>>> >> with a field containing the following value:
>>> >> "foo
>>> >> bar baz
>>> >> biz"
>>> >>
>>> >> Sqoop will spit out:
>>> >> "foo\
>>> >> bar baz\
>>> >> biz"
>>> >>
>>> >>
>>> >> No if I run a map reduce job on this with the TextInputFormat, the
>>> >> record will be terminated after "foo" not after "biz".  I did a little
>>> >> digging and TextInputFormat uses LineRecordReader, which uses
>>> >> LineReader which looking at the source, clearly does not honor the
>>> >> escape char.  Is there a tool/input format/etc that will read from
>>> >> HDFS and honor this?  It does not seem that M/R can do it out of the
>>> >> box.  I can't find a way to get Pig.  I assume there must be something
>>> >> that will honor the escape, but can not find anything.
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Oct 21, 2011 at 5:26 AM, Alexander C.H. Lorenz
>>> >> <[email protected]> wrote:
>>> >>> Hi Mark,
>>> >>> --escaped-by \/ (backslash - slash) tells bash to escape the next
>>> >>> character.
>>> >>> (if I understood you right)
>>> >>> - Alex
>>> >>> On Fri, Oct 21, 2011 at 12:12 AM, Mark Roddy <[email protected]>
>>> >>> wrote:
>>> >>>>
>>> >>>> I'm moving free form data out of a RDBMS that has a lot of \n, \r\n,
>>> >>>> and \t characters.
>>> >>>>
>>> >>>> I used "--escaped-by \\" (extra \ cause of bash), but I'm a little
>>> >>>> confused about what to do with this data now.  I can't seem to find
>>> >>>> any tools that will honor the '\' escape char.  TextInputFormat does
>>> >>>> not seem to.
>>> >>>>
>>> >>>> I'm working on replacing an existing in house tool w/sqoop that
>>> >>>> replace newlines with the literal string '\n'.  I'd be happy to do as
>>> >>>> such but I don't see any way of doing so.
>>> >>>>
>>> >>>> I'm sure I'm not the first person to run into this so I appreciate
>>> >>>> any
>>> >>>> suggestions.
>>> >>>>
>>> >>>> -Mark
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Alexander Lorenz
>>> >>> http://mapredit.blogspot.com
>>> >>>
>>> >>>
>>> >>
>>> >
>>
>>
>

Reply via email to