Re: Need Suggestions to to sqoop import fastly

Syed Akram Fri, 13 Mar 2015 07:41:20 -0700

Thanks For your valuable response Abe and Martin, This helped me and i gave a 
try on that.


Along with this, 


I need one more suggestion,


one db with 600 tables taking 3 hours of time to do hive import using below one.
Like this i have 200 db's and if i run sequentially 200 db's one by one it 
takes 25 days to complete one db cluster.


But this is not the result i am expecting, i want to import the data (all the 
200 db's) in 2 days(at most). How to make this happen,


i.e., one DB Cluster with more than or equal to 200 db's should be done as soon 
as possible. 


Suggestions on this. I'm working on this for long time. suggestions are 
valuable. 




Cheers!!!!

---- On Thu, 12 Mar 2015 23:29:00 +0530 Juan Martin 
Pampliega&lt;[email protected]&gt; wrote ---- 

If you are using InnoDB you can use something like:

SELECT table_rows FROM INFORMATION_SCHEMA.TABLES
  WHERE table_schema = 'db_name' AND table_name LIKE 'name_of_table';


On Thu, Mar 12, 2015 at 2:53 PM, Abraham Elmahrek &lt;[email protected]&gt; 
wrote:
Hey Syed,


Sqoop has to boot a MR job in order to do the data transfer. This takes some 
time. As such, would the following work?


#!/bin/bash
 [[ $( mysql test -e "SELECT COUNT(*) FROM test" | tail -1 ) -gt 0 ]] 
&amp;&amp; sqoop import ...

The COUNT statement should be lightning fast if you're using MyISAM as your 
storage engine.


-Abe

On Thu, Mar 12, 2015 at 5:57 AM, Syed Akram &lt;[email protected]&gt; 
wrote:

Hi,

I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive 


I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and in 
each db it has at least 600 tables(mixture of big and small/empty tables).


When I'm importing big tables, The performance is quite good.


But When i'm trying to do sqoop import  small tables ( i say empty tables with 
0 records) is taking at least 20 secs of time for each table.


1.How can i reduce this time for small tables?


my sqoop import query looks like this:


                                             sqoop "import",
 "--connect", uri, 
 "--query", sqlText,
 "--map-column-java", 
 "oprtype=Integer", 
 "--target-dir", targetDir,
 "--hive-import",
 "--hive-table", hiveTable,
 "--username", userName, 
 "--password", password, 
 "--split-by", primaryKey, 
 "--num-mappers","2",
 "--boundary-query",boundaryQry,
 "--hive-overwrite",
 "--class-name",tableName,
 "--outdir", "tmp_sqoop/"+tableName


where "--query" is "select tableName.*, oprtype as 0, modified_time as 0 where 
$CONDITIONS"


"--split-by" primarykey
"--boundary-query" select min(primarykey), max(primarykey) from table;


This runs fine for big table having even billions of rows.


But for small table, iam noticing constant time taking to do sqoop import.


How do i optimize the things for small tables or tables with 0 records. I want 
to reduce the latency for small tables.




Please suggest me in this area,




Cheers!!!!

Re: Need Suggestions to to sqoop import fastly

Reply via email to