Hubert Zhang created HAWQ-210:
---------------------------------

             Summary: Improve data locality when table size is small
                 Key: HAWQ-210
                 URL: https://issues.apache.org/jira/browse/HAWQ-210
             Project: Apache HAWQ
          Issue Type: Improvement
          Components: Core
            Reporter: Hubert Zhang
            Assignee: Lei Chang


Currently, data locality is based on a heuristic greedy algotirhm.
First consider continue blocks for a vseg and then non continue blocks and 
finally non local blocks.
But when a file contains several continue blocks but each vseg could only 
process one blocks due to avg size. In this case continue blocks are assigned 
to different vsegs one by one, and they are to be treated as non continue 
blocks.
In this improvement, we try to add continue infomation to help choosing the 
right vseg in non continue blocks allocation stages. The main idea is to go 
through the blocks in a file, and find the host which include the max number of 
blocks in this file. We call this host as INSERT HOST. When assigning non 
continue blocks, we prefer INSERT HOST to other hosts when they are all local 
read.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to