Austin,

There are some of the great questions asked simply in your email.

Datawarehouse and hadoop echo system goes hand-on-hand. I don't think you need 
to move all data from your warehouse to hive and hbase. This is the key :) you 
need to understand where should you use have and where can you utilize hbase 
for your existing datawarehouse. There is no reason for which you can just move 
everything from current state of datawarehouse to hadoop. Anyway, I can tell 
you some of the learning here, if you have data which batch loaded and has huge 
volume(more than 1 TB) and you can flatten the structure to a table then use 
Hive it will give you flexibility to slide and dice the data between hive 
table. Remember, it should be batch updated or never updated from the source 
(dimensional data basically). However all the frequent transaction can be 
routed through Hbase. Hbase gives you better update mechanism than Hive. If 
you're working on web data(crawler or logs) then Pig can give you quick 
utilization to bring on that data to analysis or creating reporting dataset for 
further analysis.

Remember, the way you design the data structure or model in datawarehouse world 
is all together different than the way you define in hadoop echo system.

I would iterate first find out what data you need to move to hadoop echo system 
on what reason? Believe me MySql can give you far quicker response . Don't use 
hadoop to replace your MySQL datawarehouse.

Thank You,
Manish.  

Sent from my BlackBerry, pls excuse typo

-----Original Message-----
From: Austin Chungath <austi...@gmail.com>
Date: Mon, 22 Oct 2012 16:47:04 
To: <user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Implementing a star schema (facts & dimension model)

Hi,

I am new to data warehousing in hadoop. This might be a trivial question
but I was unable to find any answers in the mailing list.
My questions are:
A person has an existing data warehouse that uses a star schema
(implemented in a mysql database).How to migrate it to Hadoop?
I can use sqoop to copy my tables to hive, that much I know.

But what happens to referential integrity? since there are no primary key /
foreign key concepts.
I have seen that I can use Hive & Hbase together. Is there a method for
storing facts and dimension tables in hadoop using Hive & Hbase together?
Does putting dimensions in Hbase & facts in Hive make any sense? or should
it be the other way around?

Consider de-normalization is not an option.
What is the best practice to port an existing data warehouse to hadoop,
with minimum changes to the database model?

Please let me know with whatever views you have on this.

Thanks,
Austin

Reply via email to