Thanks For your valuable response Abe and Martin, This helped me and i gave a try on that.
Along with this, I need one more suggestion, one db with 600 tables taking 3 hours of time to do hive import using below one. Like this i have 200 db's and if i run sequentially 200 db's one by one it takes 25 days to complete one db cluster. But this is not the result i am expecting, i want to import the data (all the 200 db's) in 2 days(at most). How to make this happen, i.e., one DB Cluster with more than or equal to 200 db's should be done as soon as possible. Suggestions on this. I'm working on this for long time. suggestions are valuable. Cheers!!!! ---- On Thu, 12 Mar 2015 23:29:00 +0530 Juan Martin Pampliega<[email protected]> wrote ---- If you are using InnoDB you can use something like: SELECT table_rows FROM INFORMATION_SCHEMA.TABLES WHERE table_schema = 'db_name' AND table_name LIKE 'name_of_table'; On Thu, Mar 12, 2015 at 2:53 PM, Abraham Elmahrek <[email protected]> wrote: Hey Syed, Sqoop has to boot a MR job in order to do the data transfer. This takes some time. As such, would the following work? #!/bin/bash [[ $( mysql test -e "SELECT COUNT(*) FROM test" | tail -1 ) -gt 0 ]] && sqoop import ... The COUNT statement should be lightning fast if you're using MyISAM as your storage engine. -Abe On Thu, Mar 12, 2015 at 5:57 AM, Syed Akram <[email protected]> wrote: Hi, I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and in each db it has at least 600 tables(mixture of big and small/empty tables). When I'm importing big tables, The performance is quite good. But When i'm trying to do sqoop import small tables ( i say empty tables with 0 records) is taking at least 20 secs of time for each table. 1.How can i reduce this time for small tables? my sqoop import query looks like this: sqoop "import", "--connect", uri, "--query", sqlText, "--map-column-java", "oprtype=Integer", "--target-dir", targetDir, "--hive-import", "--hive-table", hiveTable, "--username", userName, "--password", password, "--split-by", primaryKey, "--num-mappers","2", "--boundary-query",boundaryQry, "--hive-overwrite", "--class-name",tableName, "--outdir", "tmp_sqoop/"+tableName where "--query" is "select tableName.*, oprtype as 0, modified_time as 0 where $CONDITIONS" "--split-by" primarykey "--boundary-query" select min(primarykey), max(primarykey) from table; This runs fine for big table having even billions of rows. But for small table, iam noticing constant time taking to do sqoop import. How do i optimize the things for small tables or tables with 0 records. I want to reduce the latency for small tables. Please suggest me in this area, Cheers!!!!
