Make sure there are no primary key clash. HBase would over write the row if you 
upload data with same primary key. That's one reason you can possibly get less 
rows than what you uploaded 
 

Sent from my mobile device, please excuse the typos

> On May 1, 2014, at 3:34 PM, "Kennedy, Sean C." <sean.kenn...@merck.com> wrote:
> 
> I ran the following command to import an excel.csv file into hbase. 
> Everything looked ok however when I ran a scan on the table in hbase I did 
> not see as many rows as were in excel.csv file.
>  
> Any help appreciated….
>  
>  
>  
> /hd/hadoop/bin/hadoop jar /hbase/hbase-0.94.15/hbase-0.94.15.jar importtsv 
> '-Dimporttsv.separator=,' 
> -Dimporttsv.columns=HBASE_ROW_KEY,ROOT,NODE,VALUE,X_PATH,IMG,NODE_URL,LFLAG,SORT_ORDER,SITE
>  V_MES_INPUT_TREE /ma/segwhdfs/hpp/hbase/MES/csv/MES_INPUT_TREE
>  
>  
> The csv file had over 200,000 rows, however my hbase scan returned only 3500 
> or so rows.  
>  
> Output from scan ‘MES_INPUT_TREE’
>  
> 3855 row(s) in 5.6090 seconds
>  
>  
> Output from job:
>  
> 4/05/01 17:58:53 INFO mapred.JobClient: Job complete: job_201405011721_0001
> 14/05/01 17:58:53 INFO mapred.JobClient: Counters: 20
> 14/05/01 17:58:53 INFO mapred.JobClient:   Job Counters
> 14/05/01 17:58:53 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1208423
> 14/05/01 17:58:53 INFO mapred.JobClient:     Total time spent by all reduces 
> waiting after reserving slots (ms)=0
> 14/05/01 17:58:53 INFO mapred.JobClient:     Total time spent by all maps 
> waiting after reserving slots (ms)=0
> 14/05/01 17:58:53 INFO mapred.JobClient:     Rack-local map tasks=1
> 14/05/01 17:58:53 INFO mapred.JobClient:     Launched map tasks=4
> 14/05/01 17:58:53 INFO mapred.JobClient:     Data-local map tasks=3
> 14/05/01 17:58:53 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1427
> 14/05/01 17:58:53 INFO mapred.JobClient:   ImportTsv
> 14/05/01 17:58:53 INFO mapred.JobClient:     Bad Lines=3
> 14/05/01 17:58:53 INFO mapred.JobClient:   File Output Format Counters
> 14/05/01 17:58:53 INFO mapred.JobClient:     Bytes Written=0
> 14/05/01 17:58:53 INFO mapred.JobClient:   FileSystemCounters
> 14/05/01 17:58:53 INFO mapred.JobClient:     HDFS_BYTES_READ=5243015
> 14/05/01 17:58:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=80374
> 14/05/01 17:58:53 INFO mapred.JobClient:   File Input Format Counters
> 14/05/01 17:58:53 INFO mapred.JobClient:     Bytes Read=5242880
> 14/05/01 17:58:53 INFO mapred.JobClient:   Map-Reduce Framework
> 14/05/01 17:58:53 INFO mapred.JobClient:     Map input records=22494
> 14/05/01 17:58:53 INFO mapred.JobClient:     Physical memory (bytes) 
> snapshot=112275456
> 14/05/01 17:58:53 INFO mapred.JobClient:     Spilled Records=0
> 14/05/01 17:58:53 INFO mapred.JobClient:     CPU time spent (ms)=2430
> 14/05/01 17:58:53 INFO mapred.JobClient:     Total committed heap usage 
> (bytes)=145752064
> 14/05/01 17:58:53 INFO mapred.JobClient:     Virtual memory (bytes) 
> snapshot=769548288
> 14/05/01 17:58:53 INFO mapred.JobClient:     Map output records=22491
> 14/05/01 17:58:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=135
> Notice:  This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
> New Jersey, USA 08889), and/or its affiliates Direct contact information
> for affiliates is available at 
> http://www.merck.com/contact/contacts.html) that may be confidential,
> proprietary copyrighted and/or legally privileged. It is intended solely
> for the use of the individual or entity named on this message. If you are
> not the intended recipient, and have received this message in error,
> please notify us immediately by reply e-mail and then delete it from 
> your system.

Reply via email to