[ 
https://issues.apache.org/jira/browse/KYLIN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yerui Sun updated KYLIN-1379:
-----------------------------
    Description: 
After KYLIN-1186, we've gained the ability to count distinct Int type columns 
precisely.
However, the implements of KYLIN-1186 is not stable, especially in 2.x-staging 
branch.
The reason is that the measure's maxlength is used to allocate memory in 2.x 
version, and the BitmapMeasure is hardcoded to 8MB in KYLIN-1186, causing OOM 
when cube building.
To resolve this problem, we have introduce precision on the bitmap measure, 
such as bitmap(100), bitmap(10000), bitmap(1000000), meaning the measure could 
accept 100/10000/1M cardinality at most. This solution should be fine, 
considering the reality, if the count value over 1000000, the hyperloglog 
measure which produce approx. result should be acceptable.

  was:
After KYLIN-1186, we've gained the ability to count distinct int type columns.
However, the implements of KYLIN-1186 is not stable, especially in 2.x-staging 
branch.
The reason is that the measure's maxlength is used to allocate memory in 2.x 
version, and the BitmapMeasure is hardcoded to 8MB in KYLIN-1186, causing OOM 
when cube building.
To resolve this problem, we have introduce precision on the bitmap measure, 
such as bitmap(100), bitmap(10000), bitmap(1000000), meaning the measure could 
accept 100/10000/1M cardinality at most. This solution should be fine, 
considering the reality, if the count value over 1000000, the hyperloglog 
measure which produce approx. result should be acceptable.


> More stable precise count distinct implements after KYLIN-1186
> --------------------------------------------------------------
>
>                 Key: KYLIN-1379
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1379
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v2.1, v1.3
>            Reporter: Yerui Sun
>            Assignee: Yerui Sun
>
> After KYLIN-1186, we've gained the ability to count distinct Int type columns 
> precisely.
> However, the implements of KYLIN-1186 is not stable, especially in 
> 2.x-staging branch.
> The reason is that the measure's maxlength is used to allocate memory in 2.x 
> version, and the BitmapMeasure is hardcoded to 8MB in KYLIN-1186, causing OOM 
> when cube building.
> To resolve this problem, we have introduce precision on the bitmap measure, 
> such as bitmap(100), bitmap(10000), bitmap(1000000), meaning the measure 
> could accept 100/10000/1M cardinality at most. This solution should be fine, 
> considering the reality, if the count value over 1000000, the hyperloglog 
> measure which produce approx. result should be acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to