+1
On Fri, Oct 21, 2011 at 11:03 AM, Mark Roddy <[email protected]> wrote: > Seeing as I'm now depending on this behavior, I nominate that that bug > be upgraded to feature :-) > > -Mark > > On Fri, Oct 21, 2011 at 1:38 PM, [email protected] > <[email protected]> wrote: >> Glad it worked Mark! >>> And it looks like you don't have to do a hive import to use it. >> That sounds like a bug to me :) >> Arvind >> >> On Fri, Oct 21, 2011 at 9:41 AM, Mark Roddy <[email protected]> wrote: >>> >>> Thanks for the help Arvind. The hive-drop-import-delims worked. And >>> it looks like you don't have to do a hive import to use it. >>> >>> -Mark >>> >>> >>> On Fri, Oct 21, 2011 at 11:43 AM, Arvind Prabhakar <[email protected]> >>> wrote: >>> > One work around worth trying is to use the "--hive-drop-import-delims" >>> > option and do a hive import. With this option set, Sqoop will remove >>> > any new lines or ^A characters which are the default delimiters used >>> > for Hive. After the import is done, you could copy the file out of >>> > Hive directly and use it in your application. >>> > >>> > Arvind >>> > >>> > On Fri, Oct 21, 2011 at 7:05 AM, Mark Roddy <[email protected]> wrote: >>> >> I used "--escaped-by \\" due to bash, so that "\" would be the escape >>> >> character used. That works fine, I end up with \n and \t characters >>> >> escaped by '\'. >>> >> >>> >> >>> >> To put the problem more concretely, I have a singe record from the db >>> >> with a field containing the following value: >>> >> "foo >>> >> bar baz >>> >> biz" >>> >> >>> >> Sqoop will spit out: >>> >> "foo\ >>> >> bar baz\ >>> >> biz" >>> >> >>> >> >>> >> No if I run a map reduce job on this with the TextInputFormat, the >>> >> record will be terminated after "foo" not after "biz". I did a little >>> >> digging and TextInputFormat uses LineRecordReader, which uses >>> >> LineReader which looking at the source, clearly does not honor the >>> >> escape char. Is there a tool/input format/etc that will read from >>> >> HDFS and honor this? It does not seem that M/R can do it out of the >>> >> box. I can't find a way to get Pig. I assume there must be something >>> >> that will honor the escape, but can not find anything. >>> >> >>> >> >>> >> >>> >> On Fri, Oct 21, 2011 at 5:26 AM, Alexander C.H. Lorenz >>> >> <[email protected]> wrote: >>> >>> Hi Mark, >>> >>> --escaped-by \/ (backslash - slash) tells bash to escape the next >>> >>> character. >>> >>> (if I understood you right) >>> >>> - Alex >>> >>> On Fri, Oct 21, 2011 at 12:12 AM, Mark Roddy <[email protected]> >>> >>> wrote: >>> >>>> >>> >>>> I'm moving free form data out of a RDBMS that has a lot of \n, \r\n, >>> >>>> and \t characters. >>> >>>> >>> >>>> I used "--escaped-by \\" (extra \ cause of bash), but I'm a little >>> >>>> confused about what to do with this data now. I can't seem to find >>> >>>> any tools that will honor the '\' escape char. TextInputFormat does >>> >>>> not seem to. >>> >>>> >>> >>>> I'm working on replacing an existing in house tool w/sqoop that >>> >>>> replace newlines with the literal string '\n'. I'd be happy to do as >>> >>>> such but I don't see any way of doing so. >>> >>>> >>> >>>> I'm sure I'm not the first person to run into this so I appreciate >>> >>>> any >>> >>>> suggestions. >>> >>>> >>> >>>> -Mark >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Alexander Lorenz >>> >>> http://mapredit.blogspot.com >>> >>> >>> >>> >>> >> >>> > >> >> >
