Hi Anoop, Actually, I got confused after reading the doc. - I thought a simple importtsv command(which also takes table name as the argument) would suffice. But as you pointed out, completebulkload is required.
HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar completebulkload hdfs://cldx-1139-1033:9000/hbase/storefileoutput PRODUCTS Thanks for the help ! Regards, Omkar Joshi -----Original Message----- From: Anoop Sam John [mailto:anoo...@huawei.com] Sent: Tuesday, April 16, 2013 12:26 PM To: user@hbase.apache.org Subject: RE: Data not loaded in table via ImportTSV Hi Have you used the tool, LoadIncrementalHFiles after the ImportTSV? -Anoop- ________________________________________ From: Omkar Joshi [omkar.jo...@lntinfotech.com] Sent: Tuesday, April 16, 2013 12:01 PM To: user@hbase.apache.org Subject: Data not loaded in table via ImportTSV Hi, The background thread is this : http://mail-archives.apache.org/mod_mbox/hbase-user/201304.mbox/%3ce689a42b73c5a545ad77332a4fc75d8c1efbd80...@vshinmsmbx01.vshodc.lntinfotech.com%3E I'm referring to the HBase doc. http://hbase.apache.org/book/ops_mgt.html#importtsv Accordingly, my command is : HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.6.1.jar importtsv '-Dimporttsv.separator=;' -Dimporttsv.columns=HBASE_ROW_KEY,CUSTOMER_INFO:NAME,CUSTOMER_INFO:EMAIL,CUSTOMER_INFO:ADDRESS,CUSTOMER_INFO:MOBILE -Dimporttsv.bulk.output=hdfs://cldx-1139-1033:9000/hbase/storefileoutput CUSTOMERS hdfs://cldx-1139-1033:9000/hbase/copiedFromLocal/customer.txt ..../*classpath echoed here*/ 13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hduser/hadoop_ecosystem/apache_hadoop/hadoop_installation/hadoop-1.0.4/libexec/../lib/native/Linux-amd64-64 13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:os.version=3.2.0-23-generic 13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.name=hduser 13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hduser 13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hduser/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin 13/04/16 17:18:43 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cldx-1140-1034:2181 sessionTimeout=180000 watcher=hconnection 13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Opening socket connection to server cldx-1140-1034/172.25.6.71:2181. Will not attempt to authenticate using SASL (unknown error) 13/04/16 17:18:43 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 5483@cldx-1139-1033 13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Socket connection established to cldx-1140-1034/172.25.6.71:2181, initiating session 13/04/16 17:18:43 INFO zookeeper.ClientCnxn: Session establishment complete on server cldx-1140-1034/172.25.6.71:2181, sessionid = 0x13def2889530023, negotiated timeout = 180000 13/04/16 17:18:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=cldx-1140-1034:2181 sessionTimeout=180000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@34d03009 13/04/16 17:18:44 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 5483@cldx-1139-1033 13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Opening socket connection to server cldx-1140-1034/172.25.6.71:2181. Will not attempt to authenticate using SASL (unknown error) 13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Socket connection established to cldx-1140-1034/172.25.6.71:2181, initiating session 13/04/16 17:18:44 INFO zookeeper.ClientCnxn: Session establishment complete on server cldx-1140-1034/172.25.6.71:2181, sessionid = 0x13def2889530024, negotiated timeout = 180000 13/04/16 17:18:44 INFO zookeeper.ZooKeeper: Session: 0x13def2889530024 closed 13/04/16 17:18:44 INFO zookeeper.ClientCnxn: EventThread shut down 13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Looking up current regions for table org.apache.hadoop.hbase.client.HTable@238cfdf 13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Configuring 1 reduce partitions to match current region count 13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Writing partition information to hdfs://cldx-1139-1033:9000/user/hduser/partitions_4159cd24-b8ff-4919-854b-a7d1da5069ad 13/04/16 17:18:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/04/16 17:18:44 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 13/04/16 17:18:44 INFO compress.CodecPool: Got brand-new compressor 13/04/16 17:18:44 INFO mapreduce.HFileOutputFormat: Incremental table output configured. 13/04/16 17:18:47 INFO input.FileInputFormat: Total input paths to process : 1 13/04/16 17:18:47 WARN snappy.LoadSnappy: Snappy native library not loaded 13/04/16 17:18:47 INFO mapred.JobClient: Running job: job_201304091909_0010 13/04/16 17:18:48 INFO mapred.JobClient: map 0% reduce 0% 13/04/16 17:19:07 INFO mapred.JobClient: map 100% reduce 0% 13/04/16 17:19:19 INFO mapred.JobClient: map 100% reduce 100% 13/04/16 17:19:24 INFO mapred.JobClient: Job complete: job_201304091909_0010 13/04/16 17:19:24 INFO mapred.JobClient: Counters: 30 13/04/16 17:19:24 INFO mapred.JobClient: Job Counters 13/04/16 17:19:24 INFO mapred.JobClient: Launched reduce tasks=1 13/04/16 17:19:24 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16567 13/04/16 17:19:24 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/04/16 17:19:24 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/04/16 17:19:24 INFO mapred.JobClient: Launched map tasks=1 13/04/16 17:19:24 INFO mapred.JobClient: Data-local map tasks=1 13/04/16 17:19:24 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10953 13/04/16 17:19:24 INFO mapred.JobClient: ImportTsv 13/04/16 17:19:24 INFO mapred.JobClient: Bad Lines=0 13/04/16 17:19:24 INFO mapred.JobClient: File Output Format Counters 13/04/16 17:19:24 INFO mapred.JobClient: Bytes Written=1984 13/04/16 17:19:24 INFO mapred.JobClient: FileSystemCounters 13/04/16 17:19:24 INFO mapred.JobClient: FILE_BYTES_READ=1753 13/04/16 17:19:24 INFO mapred.JobClient: HDFS_BYTES_READ=563 13/04/16 17:19:24 INFO mapred.JobClient: FILE_BYTES_WRITTEN=74351 13/04/16 17:19:24 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1984 13/04/16 17:19:24 INFO mapred.JobClient: File Input Format Counters 13/04/16 17:19:24 INFO mapred.JobClient: Bytes Read=433 13/04/16 17:19:24 INFO mapred.JobClient: Map-Reduce Framework 13/04/16 17:19:24 INFO mapred.JobClient: Map output materialized bytes=1600 13/04/16 17:19:24 INFO mapred.JobClient: Map input records=5 13/04/16 17:19:24 INFO mapred.JobClient: Reduce shuffle bytes=0 13/04/16 17:19:24 INFO mapred.JobClient: Spilled Records=10 13/04/16 17:19:24 INFO mapred.JobClient: Map output bytes=1574 13/04/16 17:19:24 INFO mapred.JobClient: Total committed heap usage (bytes)=212664320 13/04/16 17:19:24 INFO mapred.JobClient: CPU time spent (ms)=4780 13/04/16 17:19:24 INFO mapred.JobClient: Combine input records=0 13/04/16 17:19:24 INFO mapred.JobClient: SPLIT_RAW_BYTES=130 13/04/16 17:19:24 INFO mapred.JobClient: Reduce input records=5 13/04/16 17:19:24 INFO mapred.JobClient: Reduce input groups=5 13/04/16 17:19:24 INFO mapred.JobClient: Combine output records=0 13/04/16 17:19:24 INFO mapred.JobClient: Physical memory (bytes) snapshot=279982080 13/04/16 17:19:24 INFO mapred.JobClient: Reduce output records=20 13/04/16 17:19:24 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2010615808 13/04/16 17:19:24 INFO mapred.JobClient: Map output records=5 As seen, there aren't any bad lines and mapper has output 5 records(the source text file has 5 records) The HDFS reflects the following : hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase Warning: $HADOOP_HOME is deprecated. Found 12 items drwxr-xr-x - hduser supergroup 0 2013-04-09 19:47 /hbase/-ROOT- drwxr-xr-x - hduser supergroup 0 2013-04-09 19:47 /hbase/.META. drwxr-xr-x - hduser supergroup 0 2013-04-16 16:02 /hbase/.archive drwxr-xr-x - hduser supergroup 0 2013-04-09 19:47 /hbase/.logs drwxr-xr-x - hduser supergroup 0 2013-04-09 19:47 /hbase/.oldlogs drwxr-xr-x - hduser supergroup 0 2013-04-16 16:05 /hbase/.tmp drwxr-xr-x - hduser supergroup 0 2013-04-16 16:05 /hbase/CUSTOMERS drwxr-xr-x - hduser supergroup 0 2013-04-16 17:14 /hbase/copiedFromLocal -rw-r--r-- 4 hduser supergroup 38 2013-04-09 19:47 /hbase/hbase.id -rw-r--r-- 4 hduser supergroup 3 2013-04-09 19:47 /hbase/hbase.version drwxr-xr-x - hduser supergroup 0 2013-04-16 17:19 /hbase/storefileoutput drwxr-xr-x - hduser supergroup 0 2013-04-09 22:03 /hbase/users hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase/storefileoutput Warning: $HADOOP_HOME is deprecated. Found 3 items drwxr-xr-x - hduser supergroup 0 2013-04-16 17:19 /hbase/storefileoutput/CUSTOMER_INFO -rw-r--r-- 4 hduser supergroup 0 2013-04-16 17:19 /hbase/storefileoutput/_SUCCESS drwxr-xr-x - hduser supergroup 0 2013-04-16 17:18 /hbase/storefileoutput/_logs hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hduser@cldx-1139-1033:~/hadoop_ecosystem/apache_hbase/hbase_installation/hbase-0.94.6.1/bin$ hadoop fs -ls /hbase/storefileoutput/CUSTOMER_INFO Warning: $HADOOP_HOME is deprecated. Found 1 items -rw-r--r-- 4 hduser supergroup 1984 2013-04-16 17:19 /hbase/storefileoutput/CUSTOMER_INFO/64a822e4ff82456785740925eccd392f But no rows are inserted in the CUSTOMERS table : hduser@cldx-1139-1033:~$ $HBASE_HOME/bin/hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.94.6.1, r1464658, Thu Apr 4 10:58:50 PDT 2013 hbase(main):001:0> scan 'CUSTOMERS' ROW COLUMN+CELL 0 row(s) in 0.8240 seconds Do I need to execute some additional step(CompleteBulkLoad?) to push the data - I'm not sure if this is required ! Regards, Omkar Joshi ________________________________ The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail"