Eugene/Susanth, Thank you for pointing me in the direction of these features. I'll investigate them further to see if I can put them to good use.
Cheers - Elliot. On 17 December 2015 at 20:03, Sushanth Sowmyan <khorg...@gmail.com> wrote: > Also, while I have not wiki-ized the documentation for the above, I > have uploaded slides from talks that I've given in hive user group > meetup on the subject, and also a doc that describes the replication > protocol followed for the EXIM replication that are attached over at > https://issues.apache.org/jira/browse/HIVE-10264 > > On Thu, Dec 17, 2015 at 11:59 AM, Sushanth Sowmyan <khorg...@gmail.com> > wrote: > > Hi, > > > > I think that the replication work added with > > https://issues.apache.org/jira/browse/HIVE-7973 is exactly up this > > alley. > > > > Per Eugene's suggestion of MetaStoreEventListener, this replication > > system plugs into that and gets you a stream of notification events > > from HCatClient for the exact purpose you mention. > > > > There's some work still outstanding on this task, most notably > > documentation (sorry!) but please have a look at > > HCatClient.getReplicationTasks(...) and > > org.apache.hive.hcatalog.api.repl.ReplicationTask. You can plug in > > your implementation of ReplicationTask.Factory to inject your own > > logic for how to handle the replication according to your needs. > > (currently there exists an implementation that uses Hive EXPORT/IMPORT > > to perform replication - you can look at the code for this, and the > > tests for these classes to see how that is achieved. Falcon already > > uses this to perform cross-hive-warehouse replication) > > > > > > Thanks, > > > > -Sushanth > > > > On Thu, Dec 17, 2015 at 11:22 AM, Eugene Koifman > > <ekoif...@hortonworks.com> wrote: > >> Metastore supports MetaStoreEventListener and MetaStorePreEventListener > >> which may be useful here > >> > >> Eugene > >> > >> From: Elliot West <tea...@gmail.com> > >> Reply-To: "user@hive.apache.org" <user@hive.apache.org> > >> Date: Thursday, December 17, 2015 at 8:21 AM > >> To: "user@hive.apache.org" <user@hive.apache.org> > >> Subject: Synchronizing Hive metastores across clusters > >> > >> Hello, > >> > >> I'm thinking about the steps required to repeatedly push Hive datasets > out > >> from a traditional Hadoop cluster into a parallel cloud based cluster. > This > >> is not a one off, it needs to be a constantly running sync process. As > new > >> tables and partitions are added in one cluster, they need to be synced > to > >> the cloud cluster. Assuming for a moment that I have the HDFS data > syncing > >> working, I'm wondering what steps I need to take to reliably ship the > >> HCatalog metadata across. I use HCatalog as the point of truth as to > when > >> when data is available and where it is located and so I think that > metadata > >> is a critical element to replicate in the cloud based cluster. > >> > >> Does anyone have any recommendations on how to achieve this in > practice? One > >> issue (of many I suspect) is that Hive appears to store table/partition > >> locations internally with absolute, fully qualified URLs, therefore > unless > >> the target cloud cluster is similarly named and configured some path > >> transformation step will be needed as part of the synchronisation > process. > >> > >> I'd appreciate any suggestions, thoughts, or experiences related to > this. > >> > >> Cheers - Elliot. > >> > >> >