Syed, This would depend on your resourcing heavily. How long are the longest Sqoop jobs taking? If the longest Sqoop jobs are taking 2 days, then you'll have to increase the number of mappers sqooping data from your database. You can fiddle with the parallelization by adding the "-m" argument. Check out http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_controlling_parallelism for more information.
-Abe On Fri, Mar 13, 2015 at 7:40 AM, Syed Akram <[email protected]> wrote: > Thanks For your valuable response Abe and Martin, This helped me and i > gave a try on that. > > Along with this, > > I need one more suggestion, > > one db with 600 tables taking 3 hours of time to do hive import using > below one. > Like this i have 200 db's and if i run sequentially 200 db's one by one it > takes 25 days to complete one db cluster. > > But this is not the result i am expecting, i want to import the data (all > the 200 db's) in 2 days(at most). How to make this happen, > > i.e., one DB Cluster with more than or equal to 200 db's should be done as > soon as possible. > > Suggestions on this. I'm working on this for long time. suggestions are > valuable. > > > Cheers!!!! > > ---- On Thu, 12 Mar 2015 23:29:00 +0530 *Juan Martin > Pampliega<[email protected] <[email protected]>>* wrote ---- > > If you are using InnoDB you can use something like: > > SELECT table_rows FROM INFORMATION_SCHEMA.TABLES > WHERE table_schema = 'db_name' AND table_name LIKE 'name_of_table'; > > On Thu, Mar 12, 2015 at 2:53 PM, Abraham Elmahrek <[email protected]> > wrote: > > Hey Syed, > > Sqoop has to boot a MR job in order to do the data transfer. This takes > some time. As such, would the following work? > > #!/bin/bash > > [[ $( mysql test -e "SELECT COUNT(*) FROM test" | tail -1 ) -gt 0 ]] && > sqoop import ... > The COUNT statement should be lightning fast if you're using MyISAM as > your storage engine. > > -Abe > > On Thu, Mar 12, 2015 at 5:57 AM, Syed Akram <[email protected]> > wrote: > > > Hi, > > I am using Sqoop 1.4.5 and i'm doing import from MySQL to Hive > > I'm having a MySQL DBCluster of 200GB data, in which it have 200 db's and > in each db it has at least 600 tables(mixture of big and small/empty > tables). > > When I'm importing big tables, The performance is quite good. > > But When i'm trying to do sqoop import small tables ( i say empty tables > with 0 records) is taking at least 20 secs of time for each table. > > *1.How can i reduce this time for small tables?* > > *my sqoop import query looks like this:* > > > > > > > > > > > * sqoop "import", "--connect", > uri, "--query", sqlText, "--map-column-java", "oprtype=Integer", > "--target-dir", targetDir, "--hive-import", "--hive-table", hiveTable, > "--username", userName, "--password", password, "--split-by", > primaryKey, "--num-mappers","2", "--boundary-query",boundaryQry, > "--hive-overwrite", "--class-name",tableName, "--outdir", > "tmp_sqoop/"+tableNamewhere "--query" is "select tableName.*, oprtype as 0, > modified_time as 0 where $CONDITIONS""--split-by" > primarykey"--boundary-query" select min(primarykey), max(primarykey) from > table;This runs fine for big table having even billions of rows.But for > small table, iam noticing constant time taking to do sqoop import.How do i > optimize the things for small tables or tables with 0 records. I want to > reduce the latency for small tables.Please suggest me in this > area,Cheers!!!!* > > > > > > >
