Thanks for getting back to the mailing list and figuring this out! -Abe
On Tue, Jul 21, 2015 at 7:32 PM, Nikolay Christov <[email protected]> wrote: > Hey Abe, thanks for the response I figured what the issue is and will post > here just in case someone else bumps into this or has better solution. > > The data I was reading was badly formatted UTF-8 (non valid UTF8) but > postgresql 9.3 wasn't handling this correctly and was returning stripped or > altered characters depend on your pg client's encoding. in postgresql 9.4 > they fixed the issue and now when the client requests non UTF8 but client > encoding is set to UTF8 then it throws an error. To handle this I am > converting the bad UTF8 into byte array and then convert to base64 during > the sqoop: > encode(name::bytea, 'base64') > since hive has native support for base64, I can use unbase64() and get my > original non UTF8 string. It will be nice to skip this encoding yoga and > directly dump those non UTF8 strings into hive. > > I am on pgsql 9.4 > sqoop 1.4.5.2.2.0.0-2041 > byte representation of bad UTF8: > \x695465787420ae2c2061204a4156412d504446206c696272617279 > (iText ®, a JAVA-PDF library) > command: sqoop import --verbose --username xxxxx --password xxxxx > --connect "jdbc:postgresql://xxxxx:5432/db_name" --split-by id --target-dir > /tmp/oozietemp --num-mappers 20 --direct --hive-table=metadata > --hive-import --hive-overwrite --query "select name, id from table where > \$CONDITIONS" > > some stack trace: > 15/07/21 17:43:44 INFO mapreduce.Job: Task Id : > attempt_1433455729125_6011_m_000011_2, Status : FAILED > Error: java.io.IOException: SQLException in nextKeyValue > at > org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553) > at > org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at > org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: org.postgresql.util.PSQLException: ERROR: invalid regular > expression: invalid escape \ sequence > at > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270) > at > org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998) > at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:570) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:420) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:305) > at > org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111) > at > org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235) > ... 12 more > > On Tue, Jul 21, 2015 at 4:49 PM, Abraham Elmahrek <[email protected]> > wrote: > >> Hey man, >> >> Could you send over the following: >> >> - Command that you're running >> - Version of Sqoop you're using >> - Run the command with --verbose and send the full output? >> >> -Abe >> >> On Tue, Jul 21, 2015 at 12:43 PM, Nikolay Christov <[email protected]> >> wrote: >> >>> co >> >> >> >> >
