Hi Arvind, The row key is PARENTID, OWNERORGID, MILESTONETYPEID, PARENTTYPE Each parentid will have a list of MILESTONETYPEID (19661, 1, 2 , etc..). So your query will return all the parentids.. I am looking of rparentid that does not have a MILESTONETYPEID
Thanks, Pradheep From: Arvind S <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Thursday, February 23, 2017 at 1:19 AM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: Phoenix query performance why cant you reduce your query to select msbo1.PARENTID from msbo_phoenix_comp_rowkey where msbo1.PARENTTYPE = 'SHIPMENT' and msbo1.OWNERORGID = 100 and msbo1.MILESTONETYPEID != 19661 and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000 group by msbo1.PARENTID order by msbo1.PARENTID ?? looks like that's what you get by this left outer. Cheers !! Arvind On 22 February 2017 at 22:58, Pradheep Shanmugam <[email protected]<mailto:[email protected]>> wrote: Hi, We have a hbase cluster with 8 region servers with 20G memory We have a table with 1 column family along with a secondary index. Following query took only few milliseconds when we had less data(< 1 million) After adding more data(~30M rows) the performance declined and took about a minute or more(not stable) select msbo1.PARENTID from msbo_phoenix_comp_rowkey msbo1 left outer join ( select PARENTID,MILESTONETYPEID from msbo_phoenix_comp_rowkey where PARENTREFERENCETIME between 1479964000 and 1480464000<tel:01480%20464%20000> and OWNERORGID = 100 and PARENTTYPE = 'SHIPMENT' and MILESTONETYPEID = 19661 group by PARENTID,MILESTONETYPEID ) msbo2 on msbo1.PARENTID = msbo2.PARENTID where msbo1.PARENTTYPE = 'SHIPMENT' and msbo1.OWNERORGID = 100 and msbo2.MILESTONETYPEID is null and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000<tel:01480%20464%20000> group by msbo1.PARENTID order by msbo1.PARENTID The RHS return about a 500K rows ..LHS about 18M rows…final result about 500K rows MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME is the index Query plan: CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME [0,'SHIPMENT',100] SERVER FILTER BY FIRST KEY ONLY AND (TO_UNSIGNED_LONG("PARENTREFERENCETIME") >= 1477958400 AND TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000<tel:01480%20464%20000>) SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"] CLIENT MERGE SORT PARALLEL LEFT-JOIN TABLE 0 CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,464,000] SERVER FILTER BY FIRST KEY ONLY SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID", "MILESTONETYPEID"] CLIENT MERGE SORT AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL Attached the phoenix log. I see the caching to set as 100..and "maxResultSize”:2097152..is that something that can be tuned will help? Is that the client merge sort consuming more time can be improved? Is there any other tuning possible? Thanks, Pradheep
