Hello All,
I think we can have a discussion on the time consuming during cube
build steps.
Our team test kylin's performance to check whether kylin is suited to
our requirements. Our environment is as follows,
Hadoop 2.7.2 (file replication is 2)
Hive 1.2
HBase 1.1.3
Kylin 1.3-HBase 1.1.3
OS CentOS 6.7
We test kylin in 2 different ways seperately.
1, dimensions from 4 to 10(increased by 2)
2, cluster nodes from 3 to 5.
We have some interesting results to discuss
1,after extended nodes(No data balance), time consuming is
obviously cutted at 10 dims and 12 dims, but have little change at 4/6/8
dims.
2,after extended nodes(data balance done), time consuming is
mostly the same to having no data balance, some times even more when dims
is bigger(e.g. 12 dim).
3,Wether our test method is the right way ?
For these problems, We want to analysis it from source code. Due to
my little experience in reading source code and the little comment in
source code, so here the discussion.
Starting from the source code engine-mr-steps......
By the way, what's puprpose of the invertedindex in Kylin ?