[ 
https://issues.apache.org/jira/browse/KYLIN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293057#comment-15293057
 ] 

Yerui Sun commented on KYLIN-1379:
----------------------------------

I've worked on this issue for several weeks, and it's time to release. 
The new bitmap implementation will fully support all data types, based on 
cube-level append-able dictionary, which introduced by KYLIN-1705. 
In our scenario, the bitmap precision is difficult to decide. That's why we 
didn't introduced the precision concept in new version, instead of support any 
size bitmap. It may cause the fix-sized byte buffer overflow, and we have 
resolved this by KYLIN-1718.
We also found the query is much slower when query count distinct over 10M, and 
found the compression in endpoint is expensive. Here's an improvement in 
KYLIN-1719 by disable compression.
All the above issues has been pushed to one branch called 
KYLIN-1379-1705-1718-1719, thanks [~liyang.g...@gmail.com] for your help and 
reviewing, and any comments is welcome.

> More stable and functional precise count distinct implements after KYLIN-1186
> -----------------------------------------------------------------------------
>
>                 Key: KYLIN-1379
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1379
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v1.5.0, v1.3.0
>            Reporter: Yerui Sun
>            Assignee: Yerui Sun
>
> After KYLIN-1186, we've gained the ability to count distinct Int type columns 
> precisely.
> However, the implements of KYLIN-1186 is not stable, especially in 
> 2.x-staging branch.
> The reason is that the measure's maxlength is used to allocate memory in 2.x 
> version, and the BitmapMeasure is hardcoded to 8MB in KYLIN-1186, causing OOM 
> when cube building.
> To resolve this problem, we have introduce precision on the bitmap measure, 
> such as bitmap(100), bitmap(10000), bitmap(1000000), meaning the measure 
> could accept 100/10000/1M cardinality at most. This solution should be fine, 
> considering the reality, if the count value over 1000000, the hyperloglog 
> measure which produce approx. result should be acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to