scan.setFilter(List.of(res1, res2)); What is the 'List' here? You mean FilterList? How do you combine these two filters, AND or OR?
We have done a bunch of fixes around the semantic of FilterList, please see this issue https://issues.apache.org/jira/browse/HBASE-18410 Maybe it affects your usage. Thanks. Hamado Dene <hamadod...@yahoo.com.invalid> 于2021年11月27日周六 下午9:06写道: > Thank you in advance for the information you are giving us.As for the > filters, in this case we set two filters: > > org.apache.hadoop.hbase.filter.SingleColumnValueFilter res1 = > new org.apache.hadoop.hbase.filter.SingleColumnValueFilter(family, > colQualifier, > org.apache.hadoop.hbase.filter.CompareFilter.CompareOp.EQUAL, > intValueToBytes); > res1.setFilterIfMissing(true); > res1.setLatestVersionOnly(true); > > > > > > org.apache.hadoop.hbase.filter.SingleColumnValueFilter res2 = > new org.apache.hadoop.hbase.filter.SingleColumnValueFilter(family, > colQualifier, > org.apache.hadoop.hbase.filter.CompareFilter.CompareOp.LESS_OR_EQUAL, > longValueToBytes); > res2.setFilterIfMissing(true); > res2.setLatestVersionOnly(true); > > > > > > scan.setFilter(List.of(res1, res2)); > > What do you think about these filters? We left them unchanged from > hbase94, so they might have a negative impact on hbase2? > As for readType, we can try to force to STREAM. > thanks, > Hamado Dene > > > Il sabato 27 novembre 2021, 13:13:55 CET, 张铎(Duo Zhang) < > palomino...@gmail.com> ha scritto: > > The behavior for filters has been changed a lot between 0.94 and 2.x. Mind > providing more information about what filter you use? > > And for large scans, STREAM can perform better than PREAD. The DEFAULT > option means start from PREAD first and change to STREAM if we read enough > data. > > The responseTooSlow logs are normal if you are doing large scans, as it > will cost several seconds for a single rpc call. Maybe we should try to > make logging smarter... > > Thanks. > > Hamado Dene <hamadod...@yahoo.com.invalid> 于2021年11月27日周六 下午4:50写道: > > > > > Hello Hbase community, > > We have recently switched to hbase 2.2.6 and have noticed that the SCANs > > are very slow. When we scan a very small amount of data (eg 100k, 200k) > we > > do not encounter any problems. But when the amount of data reaches 1 > > million, the scans become very slow.For the scans we basically set > startRow > > and endRow and apply different filters. Several threads always require > > batches of 1000 rows. To get the 1000 rows, while we call next (), we > use a > > counter and when we get to 1000 we close the scan with an > InterupException. > > This didn't give us any problems in hbase 94 and we had good performance. > > In Hbase2 we saw that there is a setLimit (int) option to specify to the > > regionserver the number of rows it wants. Also I see that it is possible > to > > set a readType which can be PREAD or STREAM.- Do you think that setting > > this option can lead to better scan performance?- What is the difference > > between PREAD and STREAM?- In which case does it make sense to use PREAD > / > > STREAM? > > We have already done some hbase server-side tuning, but we still can't > get > > good scan performance.When we start working with large amounts of data, > we > > start to see a lot of server-side "responseTooSlow".like:2021-10-28 16: > 45: > > 00,854 WARN [RpcServer.default.FPBQ.Fifo.handler = 46, queue = 1, port = > > 16020] ipc.RpcServer: (responseTooSlow): {"call": "Scan (org. > > apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos $ ScanRequest) > > "," starttimems ":" 1635432272849 "," responsesize ":" 221799 "," method > > ":" Scan "," param ":" scanner_id: 3011016724423115474 number_of_rows: > 1000 > > close_scanner: false next_call_seq: 0 client_handles_partials: true > > client_handles_heartbeats: tr \ u003cTRUNCATED \ u003e "," > processingtimems > > ": 28005," client ":" 10.200.86.173:60806","queuetimclass "":0 > > HRegionServer "," scandetails ":" table: mn1_7491_hinvio region: > > mn1_7491_hinvio .....} > > > > Thanks, > > Hamado Dene >