Hi James,

Thanks for providing the detailed info on replication.   Two questions. 

1. I am not clear how the replication work in term of view. Is this open issue 
wrt replication? 

2.  As you mention that there are still work required wrt the combination of 
transaction and replication? Does this work need to be done in hbase or 
Phoenix? Are there any existing Jira for this work? 

Thanks,
Saurabh.
Sent from Bloomberg Professional for iPhone 

----- Original Message -----
From: James Taylor <[email protected]>
To: [email protected]
At: 09-Jun-2016 11:42:46


Hi JM,
Are you looking toward replication to support DR? If so, you can rely on 
HBase-level replication with a few gotchas and some operational hurdles:

- When upgrading Phoenix versions, upgrade the server-side first for both the 
primary and secondary cluster. You can do a rolling upgrade and old clients 
will continue to work with the upgraded server, so no downtime is required (see 
Backward Compatibility[1] for more details).
- Execute Phoenix DDL (i.e. user-level changes to existing Phoenix tables, 
creation of new tables, indexes, sequences) against both the primary and 
secondary cluster with replication suspended (as otherwise you end up with a 
race condition for the replication of the SYSTEM.CATALOG table and any not yet 
existing tables). If you've upgraded Phoenix, then even if there's no DDL, you 
should at a minimum connect a Phoenix client to both the primary and secondary 
cluster to trigger any upgrades to Phoenix system tables. Once the DDL is 
complete, resume replication. 
- Do not replicate the SYSTEM.SEQUENCE table since replication is asynchronous 
and may fall behind which would be a big issue if switching over to the 
secondary cluster as sequence values could start repeating. Instead, 
incorporate a cluster ID into any sequence-based identifiers and concatenate 
this with the sequence value. In that way, the identifiers will continue to be 
unique after a DR event.
- Replicate Phoenix indexes just like data tables as the HBase-level 
replication of the data table will not trigger index updates.
- In theory, you really only need to replicate views from SYSTEM.CATALOG since 
you're executing DDL on both the primary and secondary cluster, however I don't 
think HBase has that capability (but it sure would be nice). FWIW, we're 
thinking of separating views from table definitions into separate Phoenix 
tables but need to first make these tables transactional (we're using an HBase 
mechanism that allows all or none commits to the SYSTEM.CATALOG, but it only 
works if all updates are to the same RS which is too limiting).
- It's a good idea to monitor the depth of the replication queue so you know 
if/when replication is falling behind.
- Care has to be taken wrt keeping deleted cells on both clusters if you want 
to support point-in-time backup and restore, as it's possible that compaction 
would remove cells before you're backup window has passed (this orthogonal to 
replication, but just wanted to bring it up).
- Given the asynchronous nature of HBase replication, there's no good way of 
knowing the transaction ID (i.e. timestamp) at which you have all of the data. 
Also, replication of the state that is kept by the transaction manager in terms 
of inflight and invalid transactions is left as an exercise to the reader. :-) 
In short - there's still some work to do wrt the combination of transactions 
and replication (but it'd be really interesting work if anyone is interested).

HTH. Thanks,

James

[1] https://phoenix.apache.org/upgrading.html

On Thu, Jun 9, 2016 at 7:56 AM, anil gupta <[email protected]> wrote:

Hi Jean,

Phoenix does not supports replication at present.(It will be super awesome if 
it can) So, if you want to do replication of Phoenix tables you will need to 
setup replication of all the underlying HBase tables for corresponding Phoenix 
tables.

I think you will need to replicate all the Phoenix system hbase tables, 
Global/Local secondary index table and then Primary Phoenix table.

I haven't done it yet. But, above is the way i would approach it.

Thanks,
Anil Gupta.


On Thu, Jun 9, 2016 at 6:49 AM, Jean-Marc Spaggiari <[email protected]> 
wrote:

Hi,

When Phoenix is used, what is the recommended way to do replication?

Replication acts as a client on the 2nd cluster, so should we simply configure 
Phoenix on both cluster and on the destination it will take care of updating 
the index tables, etc. Or should all the tables on the destination side, 
including Phoenix tables, be replicated on the destination side too? I seached 
a bit about that on the Phoenix site and google and did not find anything.

Thanks,

JMS


--
Thanks & Regards,
Anil Gupta


Reply via email to