Hi,
I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive
I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and in
each db it has at least 600 tables(mixture of big and small/empty tables).
When I'm importing big tables, The performance is quite good.
But When i'm trying to do sqoop import small tables ( i say empty tables with
0 records) is taking at least 20 secs of time for each table.
1.How can i reduce this time for small tables?
my sqoop import query looks like this:
sqoop "import",
"--connect", uri,
"--query", sqlText,
"--map-column-java",
"oprtype=Integer",
"--target-dir", targetDir,
"--hive-import",
"--hive-table", hiveTable,
"--username", userName,
"--password", password,
"--split-by", primaryKey,
"--num-mappers","2",
"--boundary-query",boundaryQry,
"--hive-overwrite",
"--class-name",tableName,
"--outdir", "tmp_sqoop/"+tableName
where "--query" is "select tableName.*, oprtype as 0, modified_time as 0 where
$CONDITIONS"
"--split-by" primarykey
"--boundary-query" select min(primarykey), max(primarykey) from table;
This runs fine for big table having even billions of rows.
But for small table, iam noticing constant time taking to do sqoop import.
How do i optimize the things for small tables or tables with 0 records. I want
to reduce the latency for small tables.
Please suggest me in this area,
Cheers!!!!