[ 
https://issues.apache.org/jira/browse/PHOENIX-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406950#comment-16406950
 ] 

James Taylor edited comment on PHOENIX-4594 at 3/20/18 8:20 PM:
----------------------------------------------------------------

A little more info on this one:
- Changes will be isolated to {{BaseResultIterators.getParallelScans(byte[] 
startKey, byte[] stopKey)}} method
- We currently walk through the guideposts by decoding them using 
PrefixByteCodec.decode(decoder, input) which prefix encodes the byte[] of the 
guideposts (since there will be a lot of overlap of the bytes from gp to gp).
- I'm pretty sure the issue with the slowness is due to our linear search 
(since the time is increasing as the guidepost width decreases), but you might 
want to confirm that through profiling first.
- Assuming this is the case, we'll need to make a pass through all guideposts 
and put them into a List in which we can perform a binary search. That'll use 
more memory (unless you know of a way to binary search while keeping the data 
encoded), but only during the execution of this method, so perhaps it's ok.
- Take note of the conditions we're search for as we linearly traverse the gps 
as those will become binary searches. You should be able to prune the search 
space as you go, since we traverse from smallest to biggest gp.
- We need to determine if there's a guidepost in each region as we use that 
information to drive the timestamp value we return (essentially the "last 
updated" information).
- The ExplainPlanWithStatsEnabledIT has a pretty good set of tests that need to 
keep passing after this change.




was (Author: jamestaylor):
A little more info on this one:
- Changes will be isolated to {{BaseResultIterators.getParallelScans(byte[] 
startKey, byte[] stopKey)}} method
- I'm pretty sure the issue with the slowness is due to our linear search 
(since the time is increasing as the guidepost width decreases), but you might 
want to confirm that through profiling first.
- We currently walk through the guideposts by decoding them using 
PrefixByteCodec.decode(decoder, input) which prefix encodes the byte[] of the 
guideposts (since there will be a lot of overlap of the bytes from gp to gp).
- Instead, we'll need to make a pass through all guideposts and put them into a 
List in which we can perform a binary search. That'll use more memory (unless 
you know of a way to binary search while keeping the data encoded), but only 
during the execution of this method, so perhaps it's ok.
- Take note of the conditions we're search for as we linearly traverse the gps 
as those will become binary searches. You should be able to prune the search 
space as you go, since we traverse from smallest to biggest gp.
- We need to determine if there's a guidepost in each region as we use that 
information to drive the timestamp value we return (essentially the "last 
updated" information).
- The ExplainPlanWithStatsEnabledIT has a pretty good set of tests that need to 
keep passing after this change.



> Perform binary search on guideposts during query compilation
> ------------------------------------------------------------
>
>                 Key: PHOENIX-4594
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4594
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Abhishek Singh Chouhan
>            Priority: Major
>
> If there are many guideposts, performance will suffer during query 
> compilation because we do a linear search of the guideposts to find the 
> intersection with the scan ranges. Instead, in 
> BaseResultIterators.getParallelScans() we should populate an array of 
> guideposts and perform a binary search. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to