Re: Immutable data in Hive

Ashok Kumar Wed, 30 Dec 2015 10:21:20 -0800

Tank you sir,  very helpful. Could you also briefly describe from your 
experience  the major differences between traditional ETL in DW and ELT in 
Hive?  Why there is emphasis to take data from traditional transactional 
databases into Hive table with the same format and do the transform in Hive 
after. Is it because Hive is meant to be efficient in data transformation? 
Regards


    On Wednesday, 30 December 2015, 18:00, Alan Gates <alanfga...@gmail.com> 
wrote:
 

 Traditionally data in Hive was write once (insert) read many.  You could 
append to tables and partitions, add new partitions, etc.  You could remove 
data by dropping tables or partitions.  But there was no updates of data or 
deletes of particular rows.  This was what was meant by immutable.  Hive was 
originally done this way because it was based on MapReduce and HDFS and these 
were the natural semantics given those underlying systems.

For many use cases (e.g. ETL) this is sufficient, and the vast majority of 
people still run Hive this way.

We added transactions and updates and deletes to Hive because some use cases 
require these features.  Hive is being used more and more as a data warehouse, 
and while updates and deletes are less common there they are still required 
(slow changing dimensions, fixing wrong data, deleting records for compliance, 
etc.)  Also streaming data into warehouses from transactional systems is a 
common use case.

Alan.


    Ashok Kumar  December 29, 2015 at 14:59  Hi,
Can someone please clarify what  "immutable data" in Hive means?
I have been told that data in Hive is/should be immutable but in that case why 
we need transactional tables in Hive that allow updates to data.
thanks and greetings

Re: Immutable data in Hive

Reply via email to