Hey there, Could you please export a few of these lines to a file and run a 'hexdump' on the file if possible? It would be interesting to see what exactly those characters are.
-Abe On Mon, Sep 22, 2014 at 11:27 AM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco) <[email protected]> wrote: > Hi All, > > > > We are using *‘<EOL>*’ string( *--hive-delims-replacement ‘<EOL>’*) to > convert new lines chars in oracle fields while importing data into hive > using sqoop. > > According to sqoop documentation - > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_large_objects – > above parameter should only replace either *\n, \r or \01(^A)* characters > with ‘<EOL>’. > > But we seeing that some special characters are also getting replaced to > ‘<EOL>’ > > > > Our scenario: > > *Oracle Field* > > *Hive Field* > > *Notepad ++* > > *Word* > > MEIKI COMPANY,LTD > > MEIKI<EOL> COMPANY,LTD > > [image: Screen capture] > > MEIKI__COMPANY,LTD > > AVENTIS@PHARMA > > AVENTIS<EOL>@PHARMA > > [image: Screen capture] > > AVENTIS_@PHARMA > > > > But, some character in above sample which is *NOT visible* in Oracle is > being shown up as ‘*SOH*’ in notepad++ and as ‘*_*’ in word which is > being converted into *<EOL>* by sqoop. > > Please help us understand this behavior. > > What does these chars mean to sqoop/hive? > > Is sqoop expected to replace these chars which doesn’t fall under either *\n, > \r or \01(^A)* ? > > [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg] > > *Vikash Talanki* > Engineer - Software > [email protected] > Phone: *+1 (408)838 4078 <%2B1%20%28408%29838%204078>* > > *Cisco Systems Limited* > SJ-J 3 > 255 W Tasman Dr > San Jose > CA – 95134 > United States > Cisco.com <http://www.cisco.com/> > > > > [image: Think before you print.]Think before you print. > > This email may contain confidential and privileged material for the sole > use of the intended recipient. Any review, use, distribution or disclosure > by others is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by > reply email and delete all copies of this message. > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/index.html > > > > >
