kylin git commit: add blog for intersect count

sunyerui Mon, 28 Nov 2016 02:09:47 -0800

Repository: kylin
Updated Branches:
  refs/heads/document abf7b49a9 -> 06ea8a4db



add blog for intersect count


Project: http://git-wip-us.apache.org/repos/asf/kylin/repo
Commit: http://git-wip-us.apache.org/repos/asf/kylin/commit/06ea8a4d
Tree: http://git-wip-us.apache.org/repos/asf/kylin/tree/06ea8a4d
Diff: http://git-wip-us.apache.org/repos/asf/kylin/diff/06ea8a4d

Branch: refs/heads/document
Commit: 06ea8a4db8494996868ebc82ef4094cfb5562803
Parents: abf7b49
Author: sunyerui <sunye...@gmail.com>
Authored: Mon Nov 28 18:09:01 2016 +0800
Committer: sunyerui <sunye...@gmail.com>
Committed: Mon Nov 28 18:09:01 2016 +0800

----------------------------------------------------------------------
 .../_posts/blog/2016-11-28-intersect-count.md   | 58 ++++++++++++++++++++
 1 file changed, 58 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kylin/blob/06ea8a4d/website/_posts/blog/2016-11-28-intersect-count.md
----------------------------------------------------------------------
diff --git a/website/_posts/blog/2016-11-28-intersect-count.md 
b/website/_posts/blog/2016-11-28-intersect-count.md
new file mode 100644
index 0000000..091ece3
--- /dev/null
+++ b/website/_posts/blog/2016-11-28-intersect-count.md
@@ -0,0 +1,58 @@
+---
+layout: post-blog
+title:  Retention Or Conversion Rate Analyze in Apache Kylin
+date:   2016-11-28 13:30:00
+author: Yerui Sun 
+categories: blog
+---
+
+Since v.1.6.0
+
+## Background
+Retention or conversion rate is important in data analysis. In general, the 
value can be calculated based on the intersection of two data sets (uuid etc.), 
with some same dimensions (city, category, etc.) and one variety dimension 
(date etc.).
+Apache Kylin has support retention calculation based on the Bitmap and UDAF 
intersect_count. This article introduced how to use this feature.
+
+## Usage
+To use retention calculation in Apache Kylin, must meet requirements as below:
+* Only one dimension can be variety
+* The measure to be calculated have defined precisely count distinct measure
+
+The intersect_count usage is described below:
+
+```
+intersect_count(columnToCount, columnToFilter, filterValueList)
+`columnToCount` the columnt to cacluate and distinct count
+`columnToFilter` the variety dimension
+`filterValueList` the values of variety dimension, should be array
+```
+
+Here's some examples:
+
+```
+intersect_count(uuid, dt, array['20161014', '20161015'])
+The precisely distinct count of uuids shows up both in 20161014 and 20161015
+
+intersect_count(uuid, dt, array['20161014', '20161015', '20161016'])
+The precisely distinct count of uuids shows up all in 20161014, 20161015 and 
20161016
+
+intersect_count(uuid, dt, array['20161014'])
+The precisely distinct count of uuids shows up in 20161014, equivalent to 
`count(distinct uuid)`
+```
+
+A complete sql statement example:
+
+```
+select city, version,
+intersect_count(uuid, dt, array['20161014']) as first_day,
+intersect_count(uuid, dt, array['20161015']) as second_day,
+intersect_count(uuid, dt, array['20161016']) as third_day,
+intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday,
+intersect_count(uuid, dt, array['20161014', '20161015', '20161016']) as 
retention_twoday
+from visit_log
+where dt in ('2016104', '20161015', '20161016')
+group by city, version
+```
+
+## Conclusions
+Based on Bitmap and UDAF intersect_count, we can do fast and convenient 
retention analyze on Apache Kylin. Compared with the traditional way, SQL in 
Apache Kylin can be much more simple and clearly, and more efficient.
+

kylin git commit: add blog for intersect count

Reply via email to