Hive has the export/import commands, alternatively Falcon+oozie

> On 17 Dec 2015, at 17:21, Elliot West <tea...@gmail.com> wrote:
> 
> Hello,
> 
> I'm thinking about the steps required to repeatedly push Hive datasets out 
> from a traditional Hadoop cluster into a parallel cloud based cluster. This 
> is not a one off, it needs to be a constantly running sync process. As new 
> tables and partitions are added in one cluster, they need to be synced to the 
> cloud cluster. Assuming for a moment that I have the HDFS data syncing 
> working, I'm wondering what steps I need to take to reliably ship the 
> HCatalog metadata across. I use HCatalog as the point of truth as to when 
> when data is available and where it is located and so I think that metadata 
> is a critical element to replicate in the cloud based cluster.
> 
> Does anyone have any recommendations on how to achieve this in practice? One 
> issue (of many I suspect) is that Hive appears to store table/partition 
> locations internally with absolute, fully qualified URLs, therefore unless 
> the target cloud cluster is similarly named and configured some path 
> transformation step will be needed as part of the synchronisation process.
> 
> I'd appreciate any suggestions, thoughts, or experiences related to this.
> 
> Cheers - Elliot.
> 
> 

Reply via email to