Eugene/Susanth,

Thank you for pointing me in the direction of these features. I'll
investigate them further to see if I can put them to good use.

Cheers - Elliot.

On 17 December 2015 at 20:03, Sushanth Sowmyan <khorg...@gmail.com> wrote:

> Also, while I have not wiki-ized the documentation for the above, I
> have uploaded slides from talks that I've given in hive user group
> meetup on the subject, and also a doc that describes the replication
> protocol followed for the EXIM replication that are attached over at
> https://issues.apache.org/jira/browse/HIVE-10264
>
> On Thu, Dec 17, 2015 at 11:59 AM, Sushanth Sowmyan <khorg...@gmail.com>
> wrote:
> > Hi,
> >
> > I think that the replication work added with
> > https://issues.apache.org/jira/browse/HIVE-7973 is exactly up this
> > alley.
> >
> > Per Eugene's suggestion of MetaStoreEventListener, this replication
> > system plugs into that and gets you a stream of notification events
> > from HCatClient for the exact purpose you mention.
> >
> > There's some work still outstanding on this task, most notably
> > documentation (sorry!) but please have a look at
> > HCatClient.getReplicationTasks(...) and
> > org.apache.hive.hcatalog.api.repl.ReplicationTask. You can plug in
> > your implementation of  ReplicationTask.Factory to inject your own
> > logic for how to handle the replication according to your needs.
> > (currently there exists an implementation that uses Hive EXPORT/IMPORT
> > to perform replication - you can look at the code for this, and the
> > tests for these classes to see how that is achieved. Falcon already
> > uses this to perform cross-hive-warehouse replication)
> >
> >
> > Thanks,
> >
> > -Sushanth
> >
> > On Thu, Dec 17, 2015 at 11:22 AM, Eugene Koifman
> > <ekoif...@hortonworks.com> wrote:
> >> Metastore supports MetaStoreEventListener and MetaStorePreEventListener
> >> which may be useful here
> >>
> >> Eugene
> >>
> >> From: Elliot West <tea...@gmail.com>
> >> Reply-To: "user@hive.apache.org" <user@hive.apache.org>
> >> Date: Thursday, December 17, 2015 at 8:21 AM
> >> To: "user@hive.apache.org" <user@hive.apache.org>
> >> Subject: Synchronizing Hive metastores across clusters
> >>
> >> Hello,
> >>
> >> I'm thinking about the steps required to repeatedly push Hive datasets
> out
> >> from a traditional Hadoop cluster into a parallel cloud based cluster.
> This
> >> is not a one off, it needs to be a constantly running sync process. As
> new
> >> tables and partitions are added in one cluster, they need to be synced
> to
> >> the cloud cluster. Assuming for a moment that I have the HDFS data
> syncing
> >> working, I'm wondering what steps I need to take to reliably ship the
> >> HCatalog metadata across. I use HCatalog as the point of truth as to
> when
> >> when data is available and where it is located and so I think that
> metadata
> >> is a critical element to replicate in the cloud based cluster.
> >>
> >> Does anyone have any recommendations on how to achieve this in
> practice? One
> >> issue (of many I suspect) is that Hive appears to store table/partition
> >> locations internally with absolute, fully qualified URLs, therefore
> unless
> >> the target cloud cluster is similarly named and configured some path
> >> transformation step will be needed as part of the synchronisation
> process.
> >>
> >> I'd appreciate any suggestions, thoughts, or experiences related to
> this.
> >>
> >> Cheers - Elliot.
> >>
> >>
>

Reply via email to