Hive has the export/import commands, alternatively Falcon+oozie
> On 17 Dec 2015, at 17:21, Elliot West <tea...@gmail.com> wrote: > > Hello, > > I'm thinking about the steps required to repeatedly push Hive datasets out > from a traditional Hadoop cluster into a parallel cloud based cluster. This > is not a one off, it needs to be a constantly running sync process. As new > tables and partitions are added in one cluster, they need to be synced to the > cloud cluster. Assuming for a moment that I have the HDFS data syncing > working, I'm wondering what steps I need to take to reliably ship the > HCatalog metadata across. I use HCatalog as the point of truth as to when > when data is available and where it is located and so I think that metadata > is a critical element to replicate in the cloud based cluster. > > Does anyone have any recommendations on how to achieve this in practice? One > issue (of many I suspect) is that Hive appears to store table/partition > locations internally with absolute, fully qualified URLs, therefore unless > the target cloud cluster is similarly named and configured some path > transformation step will be needed as part of the synchronisation process. > > I'd appreciate any suggestions, thoughts, or experiences related to this. > > Cheers - Elliot. > >