4 Newbie questions:
1. Assuming we are ok with non-SQL access, would HBASE work as a store for
a datawarehouse?
Basically, why HIVE for a warehouse? Why not HBASE? I understand the
SQL interface to HIVE, but are there other reasons?
2. How is the HBASE data model different from Hive?
BigTable has this wiki description
sparse, distributed multi-dimensional sorted map
I could not find the corresponding description for HBASE, but I assume this
is true for HBASE as well.
So 2.1 Is the BigTable description true for HBASE as well ?
2.2 What is the corresponding description for HIVE?
3) ETL in HIVE
One typical pattern in traditional ETL is :
-- for dimension element in fact stream, lookup dimension to see if
dimension value exists
if exists, get the dimension key
if not , insert new dimension value and use this (new) value for
the current record
3.1 Can this be achieved in HIVE?
3.2 Can it be done in HIVE-SQL?
4) (More ETL)
I often find myself updating tables to add more context from "later
arriving data". This takes the form of updating columns in dimension tables,
or updating an aggregate table and such.
4.1 Can this be achieved in HIVE?
4.2 Can it be done in HIVE-SQL?
Thank you,
Shiv