Hi, We have a hbase cluster with 8 region servers with 20G memory We have a table with 1 column family along with a secondary index. Following query took only few milliseconds when we had less data(< 1 million) After adding more data(~30M rows) the performance declined and took about a minute or more(not stable)
select msbo1.PARENTID
from msbo_phoenix_comp_rowkey msbo1
left outer join (
select PARENTID,MILESTONETYPEID
from msbo_phoenix_comp_rowkey
where PARENTREFERENCETIME between 1479964000 and 1480464000
and OWNERORGID = 100
and PARENTTYPE = 'SHIPMENT'
and MILESTONETYPEID = 19661
group by PARENTID,MILESTONETYPEID
) msbo2
on msbo1.PARENTID = msbo2.PARENTID
where msbo1.PARENTTYPE = 'SHIPMENT'
and msbo1.OWNERORGID = 100
and msbo2.MILESTONETYPEID is null
and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000
group by msbo1.PARENTID
order by msbo1.PARENTID
The RHS return about a 500K rows ..LHS about 18M rows…final result about 500K
rows
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
is the index
Query plan:
CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
[0,'SHIPMENT',100]
SERVER FILTER BY FIRST KEY ONLY AND
(TO_UNSIGNED_LONG("PARENTREFERENCETIME") >= 1477958400 AND
TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000)
SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"]
CLIENT MERGE SORT
PARALLEL LEFT-JOIN TABLE 0
CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER
MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME
[0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,464,000]
SERVER FILTER BY FIRST KEY ONLY
SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID",
"MILESTONETYPEID"]
CLIENT MERGE SORT
AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL
Attached the phoenix log.
I see the caching to set as 100..and "maxResultSize”:2097152..is that something
that can be tuned will help?
Is that the client merge sort consuming more time can be improved? Is there any
other tuning possible?
Thanks,
Pradheep
phoenix.log
Description: phoenix.log
