[ https://issues.apache.org/jira/browse/PHOENIX-5280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Wong resolved PHOENIX-5280. ---------------------------------- Resolution: Implemented > Provide Improvements to Scan on Composite PK where Leading Edge not fully > Specified but the edge next columns are in most leading keys > -------------------------------------------------------------------------------------------------------------------------------------- > > Key: PHOENIX-5280 > URL: https://issues.apache.org/jira/browse/PHOENIX-5280 > Project: Phoenix > Issue Type: Improvement > Reporter: Daniel Wong > Priority: Minor > > Provide Improvements to Scan on Composite PK where Leading Edge not fully > Specified but the edge next columns are in most leading keys > Recently a user has had an issue where they have a composite pk with 2 > columns say (organizationId varchar, departmentId varchar). They want to > query all their data with a condition where department is fully qualified > department. Example SELECT * FROM TABLE WHERE departmentId='123'. They > also know that 95% of the organization leading edge contains the qualified > trailing edge. However department = '123' is less than 5% of the total data > in the table. > Based on the explain plan today for this we would run a Round Robin Full Scan > with a filter on departmentId='123'. > While one possible approach to not run a full table scan is to build an > index on department. Another approach could be to construct a new version of > a skipscan like filter to control this scan. Essentially we could use 1 > lookup to find the organizationId then additional skipscan for the trailing > key. This could be triggered with a sql syntax hint or in the future data > driven. > For a given region assume the data looks like this. > ||organizationId||departmentId|| > |org1|100| > |org4|100| > |org4|101| > |org4|123| > |org5|100| > |org5|123| > First query the initial row in the region. We get 'org1','100'. From this > we can construct the next rows of ['org1','123' - 'org1','123\x0'). After > processing that block (in our case 0 rows) we would run to the row at or > greater than nextKey(current orgnaziationId),'123'. This would give us > org4,101. We would then run to the row of 'org4','123'. Essentially 1 step > to find the orgId and then a scan of all the departments for that value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)