Hi, I'm encountering a strange behavior on MapReduce when using HBase as input format. I run my MR tasks on a same table, same dataset, with a same pattern of Fuzzy Row Filter, multiple times. The Input Records counters shown are not consistent, the smallest number can be 40% less than the largest one.
More specifically, - the table is split into 18 regions, distributed on 3 region server. The TTL is set to 10 days for the record, though the dataset for MR only includes those inserted in 7days. - The row key is defined as: sault(1byte) + time_of_hour(4bytes) + uuid(36bytes) - The scan is created as below: Scan scan = new Scan(); scan.setBatch(100); scan.setCaching(10000); scan.setCacheBlocks(false); scan.setMaxVersions(1); And the row filter for the scan is a FuzzyRowFilter that filters only events of a given time_of_hour. Everything looks fine while the result is out of expect. A same task runs 10 times, the Input Records counters show 6 different numbers, and the final output shows 6 different results. Does anyone has every faced this problem before? What could be the cause of this inconsistency of HBase scan result? Thanks