Hi, We have found several bugs while doing sqoop export from s3 to sql server , not sure if there were any patches released for these.
Sqoop command used to export the data : sqoop export \ --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \ --connect “<connect string>" \ --username “<username>" \ --password “<password>" \ --update-key audience_member_id \ --table “schema_name.tablename" \ --columns audience_member_id,addr_1,addr_2,addr_3,email,email_alt,first_name,last_name,middle_name,phone_alt,phone_day,phone_fax \ --export-dir “<s3 path>" \ --escaped-by '\\' --enclosed-by '\"' --fields-terminated-by ‘,' Here are the datatypes of the both target table and also source table ( hive ) : *APPLICATION_LOGICAL_NAME* *SQLSRV_DATABASE_NAME* *SCHEMA_NAME* *TABLE_NAME* APR BRK abc audience *COLUMN_NAME* *COLUMN_POSITION* *DATA_TYPE_HIVE* *DATA_TYPE_TARGET* *DATA_MAX_SIZE* *NOTES* ADDR_1 1 string nvarchar 75 ADDR_2 2 string nvarchar 75 ADDR_3 3 string nvarchar 75 EMAIL 4 string nvarchar 255 EMAIL_ALT 5 string nvarchar 255 FIRST_NAME 6 string nvarchar 75 LAST_NAME 7 string nvarchar 75 MIDDLE_NAME 8 string nvarchar 75 PHONE_ALT 9 string nvarchar 20 PHONE_DAY 10 string nvarchar 20 PHONE_FAX 11 string nvarchar 20 We ran Sql trace to find out what exactly was going on with sqoop export , here are the analysis we did : · *High level explanation of the captured SQL statements generated by the sqoop export* o The SQL statement uses bind variables, see details in http://msdn.microsoft.com/en-us/expression/ff848812(v=sql.90).aspx <https://connect.autodesk.com/owa/redir.aspx?C=J0oWQ6CjKkGswapMV0tR9kZEtNp949EI7iZxp3uGI6xOuSN_W7L3GZV00ant-378DMMUR9s6U90.&URL=http%3a%2f%2fmsdn.microsoft.com%2fen-us%2fexpression%2fff848812(v%3dsql.90).aspx> o The values to be assigned to the bind variables are not assigned by reference but by position o The position is decided based on the data dictionary in the current environment by column ordinal position o The column ordinal position does/may not match with the one in the Hive tables o The generated statement is incomplete and does not follow the “best” usage described in the Microsoft documentation · *Detailed explanation of the captured SQL statements generated by the sqoop export* o Below the generated statement, beautified for human readability and on the side the corrected version/comments *original* *fix* *comments* declare @p1 int set @p1=1 exec sp_prepexec @p1 output, N' @P0 nvarchar(4000) ,@P1 nvarchar(4000) ,@P2 nvarchar(4000) ,@P3 nvarchar(4000) ,@P4 nvarchar(4000) ,@P5 nvarchar(4000) ,@P6 nvarchar(4000) ,@P7 nvarchar(4000) ,@P8 nvarchar(4000) ,@P9 nvarchar(4000) ,@P10 nvarchar(4000) ,@P11 int', N' UPDATE schema_name.table_name SET addr_1=@P0 ,addr_2=@P1 ,addr_3=@P2 ,email=@P3 ,email_alt=@P4 ,first_name=@P5 ,last_name=@P6 ,middle_name=@P7 ,phone_alt=@P8 ,phone_day=@P9 ,phone_fax=@P10 WHERE audience_member_id=@P11', NULL ,N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH' ,N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH' ,N'n' ,N'n' ,N'[email protected]' ,N'KFCMHFECDFPHLPDHPIEEFBMOBMBDNDOM' ,N'n' ,N'[email protected]' ,NULL ,N'NEBFMKNHCDLIMHGLFNIAFELDGGFJFIKB' ,1314617397 select @p1 as Handle exec sp_unprepare 1 declare @p1 int; --set @p1=1; exec sp_prepexec @p1 output, N' @P0 nvarchar(4000) ,@P1 nvarchar(4000) ,@P2 nvarchar(4000) ,@P3 nvarchar(4000) ,@P4 nvarchar(4000) ,@P5 nvarchar(4000) ,@P6 nvarchar(4000) ,@P7 nvarchar(4000) ,@P8 nvarchar(4000) ,@P9 nvarchar(4000) ,@P10 nvarchar(4000) ,@P11 int', N' UPDATE schema_name.table_name SET addr_1=@P0 ,addr_2=@P1 ,addr_3=@P2 ,email=@P3 ,email_alt=@P4 ,first_name=@P5 ,last_name=@P6 ,middle_name=@P7 ,phone_alt=@P8 ,phone_day=@P9 ,phone_fax=@P10 WHERE audience_member_id=@P11', @P1=NULL ,@P6=N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH' ,@P7=N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH' ,@P8=N'n' ,@P9=N'n' ,@P3=N'[email protected]' ,@P0=N'KFCMHFECDFPHLPDHPIEEFBMOBMBDNDOM' ,@P10=N'n' ,@P4=N'[email protected]' ,@P5=NULL ,@P2=N'NEBFMKNHCDLIMHGLFNIAFELDGGFJFIKB' ,@P11=1314617397; select @p1 as Handle; exec sp_unprepare @p1; if set @p1=1is used, we get this error: Could not find prepared statement with handle 1. Added ; to separate the statements (coding best practice when generating dynamic sql) Assigned the correct bind variable to its value This will allow us to be dynamic and specific. Thesp_unprepare1 makes no sense since the @p1 is the correct handle value So what you can see here is the columns are scrambled when trying to export from sqoop to sql server. Can somebody let me know if this is a known issue or am i missing something to debug this. Thanks, Sai
