[jira] [Created] (KYLIN-1323) Improve performance of converting data to hfile

Yerui Sun (JIRA) Fri, 15 Jan 2016 02:27:13 -0800

Yerui Sun created KYLIN-1323:
--------------------------------

             Summary: Improve performance of converting data to hfile
                 Key: KYLIN-1323
                 URL: https://issues.apache.org/jira/browse/KYLIN-1323
             Project: Kylin
          Issue Type: Improvement
          Components: Job Engine
    Affects Versions: v1.2
            Reporter: Yerui Sun
            Assignee: Yerui Sun
             Fix For: v2.0, v1.3



Supposed that we got 100GB data after cuboid building, and with setting that 
10GB per region. For now, 10 split keys was calculated, and 10 region created, 
10 reducer used in ‘convert to hfile’ step. 

With optimization, we could calculate 100 (or more) split keys, and use all 
them in ‘covert to file’ step, but sampled 10 keys in them to create regions. 
The result is still 10 region created, but 100 reducer used in ‘convert to 
file’ step. Of course, the hfile created is also 100, and load 10 files per 
region. That’s should be fine, doesn’t affect the query performance 
dramatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KYLIN-1323) Improve performance of converting data to hfile

Reply via email to