Hey Abe, thanks for the response I figured what the issue is and will post
here just in case someone else bumps into this or has better solution.

The data I was reading was badly formatted UTF-8 (non valid UTF8) but
postgresql 9.3 wasn't handling this correctly and was returning stripped or
altered characters depend on your pg client's encoding. in postgresql 9.4
they fixed the issue and now when the client requests non UTF8 but client
encoding is set to UTF8 then it throws an error. To handle this I am
converting the bad UTF8 into byte array and then convert to base64 during
the sqoop:
encode(name::bytea, 'base64')
since hive has native support for base64, I can use unbase64() and get my
original non UTF8 string. It will be nice to skip this encoding yoga and
directly dump those non UTF8 strings into hive.

I am on pgsql 9.4
sqoop 1.4.5.2.2.0.0-2041
byte representation of bad UTF8:
\x695465787420ae2c2061204a4156412d504446206c696272617279
(iText ®, a JAVA-PDF library)
command: sqoop import --verbose --username xxxxx --password xxxxx --connect
"jdbc:postgresql://xxxxx:5432/db_name" --split-by id --target-dir
/tmp/oozietemp --num-mappers 20 --direct --hive-table=metadata
--hive-import --hive-overwrite --query "select name, id from table where
\$CONDITIONS"

some stack trace:
15/07/21 17:43:44 INFO mapreduce.Job: Task Id :
attempt_1433455729125_6011_m_000011_2, Status : FAILED
Error: java.io.IOException: SQLException in nextKeyValue
        at
org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
        at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
        at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.postgresql.util.PSQLException: ERROR: invalid regular
expression: invalid escape \ sequence
        at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270)
        at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998)
        at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:570)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:420)
        at
org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:305)
        at
org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
        at
org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
        ... 12 more

On Tue, Jul 21, 2015 at 4:49 PM, Abraham Elmahrek <[email protected]> wrote:

> Hey man,
>
> Could you send over the following:
>
>    - Command that you're running
>    - Version of Sqoop you're using
>    - Run the command with --verbose and send the full output?
>
> -Abe
>
> On Tue, Jul 21, 2015 at 12:43 PM, Nikolay Christov <[email protected]>
> wrote:
>
>> co
>
>
>
>

Reply via email to