Hi All, I am planning to import around 10-15 tables from Oracle to Hive using Sqoop. After the successful import of all the tables I need to capture the counts of all tables and mail them. The straight forward way is to do select count(*) on each table after it gets loaded. But since I have good number of tables and each count query runs map-reduce to give the table count.... I am looking for an alternative way to get the count of table... say, capture count in a variable that sqoop ouputs.
Here is the sample sqoop output 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: Launched map tasks=1 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: Aggregate execution time of reducers(ms)=0 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: FileSystemCounters 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: MAPRFS_BYTES_READ=87 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: MAPRFS_BYTES_WRITTEN=340491064 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: FILE_BYTES_WRITTEN=66273 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: Map-Reduce Framework 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: Map input records=921343 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: PHYSICAL_MEMORY_BYTES=522985472 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: Spilled Records=0 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: CPU_MILLISECONDS=42560 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: VIRTUAL_MEMORY_BYTES=1808834560 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: Map output records=921343 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: SPLIT_RAW_BYTES=87 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapred.JobClient: GC time elapsed (ms)=594 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 720.6803 seconds (0 bytes/sec) 2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO mapreduce.ImportJobBase: Retrieved 921343 records. Is there a way to capture the count from last line (mapreduce.ImportJobBase: Retrieved 921343 records) or from map out records line(Map output records=921343) into a variable... Please help me in this regard. [http://www.cisco.com/web/europe/images/email/signature/logo05.jpg] Vikash Talanki Engineer - Software [email protected] Phone: +1 (408)838 4078 Cisco Systems Limited SJ-J 3 255 W Tasman Dr San Jose CA - 95134 United States Cisco.com<http://www.cisco.com/> [Think before you print.]Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html
