I meant to say that the older system is running WITHOUT the extra mapping arguments.
On Wed, Mar 11, 2015 at 10:20 AM, Glen Hein <[email protected]> wrote: > I've narrowed down the problem to CLOB fields. I did figure out that if I > add the option "--map-column-java BIO=String > " (BIO is the name of the CLOB field), then newlines are filtered as > expected. > > We do have an older system running Sqoop 1.3.0 that seems to filter the > newlines for the CLOB fields with adding the --map-column-java. Do you know > if 1.4.3 differs from 1.3.0 on how CLOB are handled? > > -Glen > > On Wed, Mar 11, 2015 at 10:03 AM, Abraham Elmahrek <[email protected]> > wrote: > >> Hey man, >> >> Maybe regexp_replace + free-form query? >> >> >> http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_free_form_query_imports >> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm >> >> -Abe >> >> On Tue, Mar 10, 2015 at 9:34 AM, Glen Hein <[email protected]> wrote: >> >>> >>> I have an oracle db were some of the records contain \n and \r >>> characters. I >>> m trying to use --hive-delims-replacement to convert them to spaces. >>> I've had mixed results. It seems that when the \n and \r appear at the >>> beginning of the fields data, then they are converted as expected. But when >>> they are embedded in the middle of the field data, then they are not being >>> replaced. Here's a hexdump showing the unconverted \n and \r characters: >>> >>> 000000c0 6e 75 6c 6c 01 48 65 6c 6c 6f 2c 0d 0a 20 20 20 >>> |null.Hello,.. | >>> 000000d0 20 20 20 4d 79 20 6e 61 6d 65 20 69 73 20 4d 61 | My name >>> is Ma| >>> 000000e0 72 79 20 48 75 6e 74 2c 20 49 20 61 6d 20 31 39 |ry Hunt, I >>> am 19| >>> 000000f0 20 79 65 61 72 73 20 6f 6c 64 20 61 6e 64 20 6c | years old >>> and l| >>> 00000100 69 76 65 20 69 6e 20 43 61 6c 69 66 6f 72 6e 69 |ive in >>> Californi| >>> 00000110 61 2e 20 49 20 67 72 61 64 75 61 74 65 64 20 68 |a. I >>> graduated h| >>> 00000120 69 67 68 20 73 63 68 6f 6f 6c 20 6f 6e 20 32 30 |igh school >>> on 20| >>> 00000130 31 31 20 61 6e 64 20 64 65 63 69 64 65 64 20 74 |11 and >>> decided t| >>> 00000140 6f 20 74 61 6b 65 20 61 20 79 65 61 72 20 6f 66 |o take a >>> year of| >>> 00000150 66 20 74 6f 20 73 65 65 20 69 66 20 77 68 61 74 |f to see if >>> what| >>> 00000160 20 49 20 63 68 6f 73 65 20 61 73 20 61 20 66 75 | I chose as >>> a fu| >>> 00000170 74 75 72 65 20 63 61 72 65 65 72 20 69 73 20 77 |ture career >>> is w| >>> 00000180 68 61 74 20 49 20 72 65 61 6c 79 20 77 61 6e 74 |hat I realy >>> want| >>> 00000190 65 64 20 74 6f 20 64 6f 2e 20 4d 79 20 79 65 61 |ed to do. >>> My yea| >>> 000001a0 72 20 69 73 20 75 70 20 61 6e 64 20 49 20 73 74 |r is up and >>> I st| >>> 000001b0 69 6c 6c 20 77 61 6e 74 20 74 6f 20 62 65 20 61 |ill want to >>> be a| >>> 000001c0 20 74 65 61 63 68 65 72 2e 20 49 20 6c 6f 6f 6b | teacher. I >>> look| >>> 000001d0 20 66 6f 77 61 72 64 20 74 6f 20 6c 65 61 72 6e | foward to >>> learn| >>> 000001e0 69 6e 67 20 61 6c 6c 20 74 68 61 74 20 49 20 63 |ing all >>> that I c| >>> 000001f0 61 6e 20 61 6e 64 20 62 65 69 6e 67 20 61 6e 20 |an and >>> being an | >>> 00000200 65 6c 65 6d 65 6e 74 72 79 20 73 63 68 6f 6f 6c |elementry >>> school| >>> 00000210 20 74 65 61 63 68 65 72 2e 01 6e 75 6c 6c 01 31 | >>> teacher..null.1| >>> >>> You can see in the first row of data there is a "01" that starts the >>> field, and then a few characters later the 0d and 0a. I'm passing the >>> following option to sqoop: >>> >>> <arg>--hive-delims-replacement</arg> >>> <arg>ABCDEFG</arg> >>> >>> I've tried several different values for the actual replacement string, >>> but the results are always the same. >>> >>> Is there a better way to replace the unwanted characters? >>> >>> Thanks, >>> Glen >>> >>> >> >
