Glad it worked Mark! > And it looks like you don't have to do a hive import to use it.
That sounds like a bug to me :) Arvind On Fri, Oct 21, 2011 at 9:41 AM, Mark Roddy <[email protected]> wrote: > Thanks for the help Arvind. The hive-drop-import-delims worked. And > it looks like you don't have to do a hive import to use it. > > -Mark > > > On Fri, Oct 21, 2011 at 11:43 AM, Arvind Prabhakar <[email protected]> > wrote: > > One work around worth trying is to use the "--hive-drop-import-delims" > > option and do a hive import. With this option set, Sqoop will remove > > any new lines or ^A characters which are the default delimiters used > > for Hive. After the import is done, you could copy the file out of > > Hive directly and use it in your application. > > > > Arvind > > > > On Fri, Oct 21, 2011 at 7:05 AM, Mark Roddy <[email protected]> wrote: > >> I used "--escaped-by \\" due to bash, so that "\" would be the escape > >> character used. That works fine, I end up with \n and \t characters > >> escaped by '\'. > >> > >> > >> To put the problem more concretely, I have a singe record from the db > >> with a field containing the following value: > >> "foo > >> bar baz > >> biz" > >> > >> Sqoop will spit out: > >> "foo\ > >> bar baz\ > >> biz" > >> > >> > >> No if I run a map reduce job on this with the TextInputFormat, the > >> record will be terminated after "foo" not after "biz". I did a little > >> digging and TextInputFormat uses LineRecordReader, which uses > >> LineReader which looking at the source, clearly does not honor the > >> escape char. Is there a tool/input format/etc that will read from > >> HDFS and honor this? It does not seem that M/R can do it out of the > >> box. I can't find a way to get Pig. I assume there must be something > >> that will honor the escape, but can not find anything. > >> > >> > >> > >> On Fri, Oct 21, 2011 at 5:26 AM, Alexander C.H. Lorenz > >> <[email protected]> wrote: > >>> Hi Mark, > >>> --escaped-by \/ (backslash - slash) tells bash to escape the next > character. > >>> (if I understood you right) > >>> - Alex > >>> On Fri, Oct 21, 2011 at 12:12 AM, Mark Roddy <[email protected]> > wrote: > >>>> > >>>> I'm moving free form data out of a RDBMS that has a lot of \n, \r\n, > >>>> and \t characters. > >>>> > >>>> I used "--escaped-by \\" (extra \ cause of bash), but I'm a little > >>>> confused about what to do with this data now. I can't seem to find > >>>> any tools that will honor the '\' escape char. TextInputFormat does > >>>> not seem to. > >>>> > >>>> I'm working on replacing an existing in house tool w/sqoop that > >>>> replace newlines with the literal string '\n'. I'd be happy to do as > >>>> such but I don't see any way of doing so. > >>>> > >>>> I'm sure I'm not the first person to run into this so I appreciate any > >>>> suggestions. > >>>> > >>>> -Mark > >>> > >>> > >>> > >>> -- > >>> Alexander Lorenz > >>> http://mapredit.blogspot.com > >>> > >>> > >> > > >
