Hi, A colleague and I are working on testing a few HBase features, notably bulk import (mentioned in http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/mapr educe/package-summary.html) and running M/R jobs using HBase as input.
We¹re taking the following steps: 1a. Load HBase with a M/R job using the normal API. OR 1b. Load HBase with bulk import. THEN 2a. Using the shell, do a ³count² over the table. OR 2b. Run a M/R job that scans the whole HBase table (and nothing else). Of the 4 combos, 3 are fine: 1a+2a, 1a+2b, 1b+2a. We¹re having trouble with 1b+2b. When we run the M/R job, it doesn¹t seem to read in any records, but there are no explicit errors in either the Hadoop or HBase logs. This seems odd. It shouldn¹t matter how we load the table, and the shell¹s count operator seems to work correctly either way, counting all the records. The M/R job in 2b is the same no matter how we load the table. Any ideas on what might be wrong with the bulk import to cause this problem? We¹re thinking maybe something with the region boundaries, although they look ok in the GUI. Thanks for any suggestions, Adam
