Hi All,

I am planning to import around 10-15 tables from Oracle to Hive using Sqoop. 
After the successful import of all the tables I need to capture the counts of 
all tables and mail them.
The straight forward way is to do select count(*) on each table after it gets 
loaded.
But since I have good number of tables and each count query runs map-reduce to 
give the table count.... I am looking for an alternative way to get the count 
of table... say, capture count in a variable that sqoop ouputs.

Here is the sample sqoop output
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     Total time spent by all maps waiting after reserving 
slots (ms)=0
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     Launched map tasks=1
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     Aggregate execution time of reducers(ms)=0
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:   FileSystemCounters
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     MAPRFS_BYTES_READ=87
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     MAPRFS_BYTES_WRITTEN=340491064
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     FILE_BYTES_WRITTEN=66273
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:   Map-Reduce Framework
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     Map input records=921343
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     PHYSICAL_MEMORY_BYTES=522985472
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     Spilled Records=0
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     CPU_MILLISECONDS=42560
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     VIRTUAL_MEMORY_BYTES=1808834560
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     Map output records=921343
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     SPLIT_RAW_BYTES=87
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapred.JobClient:     GC time elapsed (ms)=594
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapreduce.ImportJobBase: Transferred 0 bytes in 720.6803 seconds (0 bytes/sec)
2014/07/17 18:25:59 - Import Sora table - (stderr) 14/07/17 15:25:59 INFO 
mapreduce.ImportJobBase: Retrieved 921343 records.

Is there a way to capture the count from last line (mapreduce.ImportJobBase: 
Retrieved 921343 records) or from map out records line(Map output 
records=921343) into a variable...

Please help me in this regard.

[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Vikash Talanki
Engineer - Software
[email protected]
Phone: +1 (408)838 4078

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA - 95134
United States
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html



  • Hive table count f... Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)

Reply via email to