Emilio Setiadarma created NIFI-12825:
----------------------------------------

             Summary: Implement processor to get row key ranges for HBase 
regions
                 Key: NIFI-12825
                 URL: https://issues.apache.org/jira/browse/NIFI-12825
             Project: Apache NiFi
          Issue Type: New Feature
            Reporter: Emilio Setiadarma
            Assignee: Emilio Setiadarma


A common way for parallelizing scan operations to HBase is to scan by row key 
ranges. In the HBase architecture, HBase splits tables into regions, each with 
a range of row keys. These row key ranges are mutually exclusive, and they 
include all the row keys.

The manual approach currently to parallelize scans to HBase via row key ranges 
is to go to HBase shell, perform the "list_regions" function to obtain row key 
ranges. This approach has its downsides, most importantly being the fact that 
row key ranges are not static. HBase regions may also split, creating two 
regions with the row key range split in the middle.

Providing a way for NiFi to obtain these row key ranges per HBase region could 
help improve the ease of creating a flow that performs scans to HBase 
parallelized by row key range. Once we know row key ranges, this information 
could be easily fed into a scanning processor (i.e. ScanHBase).

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to