Hi,

I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive 


I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and in 
each db it has at least 600 tables(mixture of big and small/empty tables).


When I'm importing big tables, The performance is quite good.


But When i'm trying to do sqoop import  small tables ( i say empty tables with 
0 records) is taking at least 20 secs of time for each table.


1.How can i reduce this time for small tables?


my sqoop import query looks like this:


                                             sqoop "import",
 "--connect", uri, 
 "--query", sqlText,
 "--map-column-java", 
 "oprtype=Integer", 
 "--target-dir", targetDir,
 "--hive-import",
 "--hive-table", hiveTable,
 "--username", userName, 
 "--password", password, 
 "--split-by", primaryKey, 
 "--num-mappers","2",
 "--boundary-query",boundaryQry,
 "--hive-overwrite",
 "--class-name",tableName,
 "--outdir", "tmp_sqoop/"+tableName


where "--query" is "select tableName.*, oprtype as 0, modified_time as 0 where 
$CONDITIONS"


"--split-by" primarykey
"--boundary-query" select min(primarykey), max(primarykey) from table;


This runs fine for big table having even billions of rows.


But for small table, iam noticing constant time taking to do sqoop import.


How do i optimize the things for small tables or tables with 0 records. I want 
to reduce the latency for small tables.




Please suggest me in this area,




Cheers!!!!






Reply via email to