Does anyone have any insight into my previous post? Greg
On Wed, Dec 17, 2014 at 11:25 AM, Krumm, Gregory <[email protected]> wrote: > Hello, > > I am working on utilizing Sqoop to export data from HDFS to SQL Server > 2012 but cannot seem to get very good performance. > > Below are some details about what I’m trying to do: > > - Export 100 million, 3-column records per day (totaling about 4GB) > - Records are stored in CSV format in part files of about 20MB > - Utilize our current cluster running Sqoop 1.4.2 and Hadoop 0.20 > - Eventually utilize our new cluster running Sqoop 1.4.5 and Hadoop > 2.4 (Currently doesn’t have YARN, so we have had to run > HADOOP_MAPREDUCE_MODE as Classic) > > While running Sqoop 1.4.2 using the sqljdbc4.jar, Sqoop only transfers > about 1 million records per hour (50MB). > Running Sqoop 1.4.5 using sqljdbc41.jar, Sqoop only transfers about 2 > million records per hour (100MB). > > Below is the Sqoop command I’m executing for this job. Based on many > executions, the records.per.statement, and statements.per.transaction, > never seem to change performance. > > *sqoop export -D sqoop.export.records.per.statement=100 -D > sqoop.export.statements.per.transaction=100 --connect > 'jdbc:sqlserver://[SERVER_NAME];username=[USERNAME];password=[PASSWORD];database=[DB > NAME]' —table=[TABLE_NAME] --export-dir /[DATA DIR] > --input-fields-terminated-by ',' --input-lines-terminated-by '\n' —verbose* > > Adding the batch or direct flags have only ever made performance worse > so at the moment, they are not included. > > > > When running the above statement, we have recorded network captures of > the Sqoop traffic to SQL server, as well as recorded SQL Server activity > via the Profiler. Both of these captures show only single records being > inserted at a time into SQL Server. > > Our SQL Server activity ends up looking like this: > > RPC:Completed exec sp_execute > 116097,N'ffdbb68a84686edd029bde9766192b15',N'north_america',37 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdbdc2bb770a13a3b4eb86836341ab3',N'south_america',1 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdbfefc63e77861d708f3f5d82086fa',N'eastern_asia',1 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdc59a3a42aa2ef0fb60fd3f604a96c',N'southeast_asia',1 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdcbd667479a412923d03c6f305eb11',N'eastern_asia',12 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdd1cc279a54921c97cef463c6773fb',N'western_europe',37 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdd437610d1dc1cb8aa77f04bf21581',N'south_america',10 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdd8204335231b2138dab577bc8d906',N'northern_europe',1 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffde22f7f6a9d2913336f44d9db269e4',N'north_america',1 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffde2e3ae2a90ad5796e1d6f5fb3e1f2',N'eastern_asia',13 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffde56a01846c01225145ed043b2f43b',N'north_america',2 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffde90f6f14a00533327cc39562f086f',N'south_america',1 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdea65ab1b1091150924b5d89d441c2',N'eastern_asia',1 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdeef3353f992fc74d5cfc54533aa49',N'north_america',1 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdefae31deecb4ff80e76bea129e4dd',N'western_europe',1 Microsoft > JDBC Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdf11270af0e9ee7841355e966d99b7',N'south_asia',1 Microsoft JDBC > Driver for SQL Server RPC:Completed exec sp_execute > 116097,N'ffdf71938015bb64e371aaedb9700ab1',N'eastern_europe',1 Microsoft > JDBC Driver for SQL Server > > > Now, if I export the same data to MySQL using Sqoop, it exports data at > closer to 15 million records an hour (roughly 7 times faster). > > Is this a known limitation to Sqoop or the Microsoft JDBC driver, or is > there something I can change on the Sqoop or SQL server side to speed up > this transfer? > > Thanks, > > Greg >
