RE: Data Duplication

roberto.tardio Sun, 02 Sep 2018 04:31:25 -0700

Hello Ravion,


Indeed Kylin generates a MOLAP cube from data source tables (Hive tables, or 
also other systems like Kafka queues or JDBC-MySQL, Oracle...). In a Kylin 
project, data sources are defined in "Data Sources" section and then a "Data 
Model" has to be created where the relationship between the source tables 
(joins in the star schema or level flake) is indicated, as well as the columns 
of each table that will be used as dimensions and those that will be used as 
measurements. After this, the last metadata layer "Cube" is defined, which is 
closely related to the generation and storage of the MOLAP cube in HBase. After 
the first construction, the generated MOLAP cube is stored in HBase. 

 

The size of this generated MOLAP cube therefore depends on the definition of 
the "Cube", where the level of pre-aggregation of the data stored in the MOLAP 
cube is determined by means of different concepts (e.g. Normal or Derived 
dimensions). For example, I have 2 Kylin Cubes mounted on Data Model which is a 
DW in Hive. This DW fact table sizes 1 Gb (ORC format and compression) Snappy.  
One of the generated kylin cubes sizes 1 Gb, that is, almost the same size as 
the DW in Hive font (1 Gb Hive + 1 Cube in HBase). However, other generated 
Kylin cube, with different cube definition over same Data Model, sizes 10 Gb. 
This bigger size is due to I defined more dimensions as Normal type in Kylin 
cube definition, in order to achieve better results in querying times.

 

I'm hoping to clear up the doubts for you.

 

Best Regards,

 

Roberto Tardío Olmos

Head of Big Data Analytics

Avenida de Brasil, 17, Planta 16.28020 Madrid

Fijo: 91.788.34.10




 

http://bigdata.stratebi.com/ 

 

http://www.stratebi.com <http://www.stratebi.com/>  

 

From: ☼ R Nair [mailto:[email protected]] 
Sent: sábado, 1 de septiembre de 2018 19:50
To: [email protected]
Subject: Data Duplication

 

Hi all,

 

I am new to Kylin. So here is a fundamental question: When I create a cube, as 
its MOLAP, I believe that irrespectivve of the already existing data in HBase, 
Kylin will create a copy of the data in a cube/multidimensional format 
(separate from the underlying Base data) to help slice/dice faster. Any idea on 
size of the duplicate copy created? Thanks

 

Best,

Ravion

RE: Data Duplication

Reply via email to