Chen Luo created ASTERIXDB-2541:
-----------------------------------

             Summary: Introduce GreedyScheduler
                 Key: ASTERIXDB-2541
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2541
             Project: Apache AsterixDB
          Issue Type: Improvement
          Components: STO - Storage
            Reporter: Chen Luo
            Assignee: Chen Luo


Our currently AsynchronousScheduler tries to schedule all merge operations at 
the same without any control. This is not optimal in terms of minimizing the 
number of disk components, which directly impacts query performance.

Here we introduce GreedyScheduler to minimize the number of disk components 
over time. It keeps tracks of all merge operations of an LSM index, and only 
activates the merge operation with the smallest number of remaining I/Os. It 
can be proven that if the number of components is the same for all merge 
operations, then this GreedyScheduler is strictly optimal. Otherwise, this will 
still be a good heuristic.

In order for GreedyScheduler to work, we need the following two changes:
* Keep track of the number of scanned pages of index cursors so that we will 
know how many pages left;
* Introduce a mechanism to activate/deactivate merge operations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to