No, we're not saying to avoid replication: at SFDC, we rely on replication to provide an active/active configuration for failover. Lars H. & co. can explain in more detail, but there are some nuances of which you should be aware. For example, the HBase table metadata needs to exist on both clusters. How is this done in your environment? One way to do this is the run the Phoenix DDL statements on both sides, but this requires some extra processing, as replication won't know about Phoenix DDL.
Whether or not you replicate indexes depends on 1) how much your use case depends on them - if they're not available, will crucial queries become so slow that it's as if the system is down?, and 2) the size of your data and how long it takes to regenerate the index. Our current thinking is to replicate the indexes just as we replicate tables (an index just looks like any other HBase table as far as HBase is concerned), as we want to be able to failover immediately without performance degradation. As far as replicating the SYSTEM.CATALOG table, that's important depending on your use case as well. If you're using views (including multi-tenant tables) that are created dynamically/on-the-fly, then you'd likely want to replicate this table as otherwise this DDL has the potential to be lost. Adding the IF NOT EXISTS that Andrew referred to would prevent an error message when running the DDL on the secondary cluster if the row from the SYSTEM.CATALOG table was already replicated. For the SYSTEM.SEQUENCE table, as Andrew pointed out, we allocate chunks of sequences and dole them out on the client. You'd want to replicate this table, as otherwise when you switch to the other cluster, you'd start repeating the same sequence values. Once replicated, if the primary cluster goes down, then the sequences will pick up at the value after the already allocated chunk (which is fine, as it's fine to have "holes" in the sequence values that get doled out). There is a potential for a race condition if the primary cluster returns a batch of new sequences and then dies before replicating the updated sequence value to the other cluster. This can be mitigated, as Andrew points out by bumping up the sequence values on a failover event. HTH. Maybe more information than you wanted? Tell us more about how you're relying on replication when you get a chance. Thanks, James On Tue, Dec 9, 2014 at 5:00 PM, Jean-Marc Spaggiari <[email protected]> wrote: > Hum. Thanks for al those updates. > > So are we saying that master/master HBase replication should be avoided when > using Phoenix with latest stable version? > > 2014-12-09 19:51 GMT-05:00 Andrew Purtell <[email protected]>: > >> You also need to replicate the Phoenix system tables. It's still necessary >> to run DDL operations on both clusters to keep Phoenix schema and HBase >> tables in sync. Use IF EXISTS or IF NOT EXISTS to avoid DDL statement >> failures. Phoenix should do the right thing. If not, it's a bug. >> >> The sequence table is interesting. The Phoenix client caches a range of >> sequence values to use when inserting data that include generated sequence >> values. You'll want to always grab a new cached range of sequence values >> when failing over from one site to another and back to avoid potential >> duplication. It's possible upon site failure that the latest updates to the >> sequence table did not replicate. Or, >> https://issues.apache.org/jira/browse/PHOENIX-1422 would side step this >> issue if implemented. >> >> >> On Mon, Dec 8, 2014 at 10:22 PM, Jeffrey Zhong <[email protected]> >> wrote: >>> >>> >>> You need to enable replication on both data & index table in Hbase level >>> using Phoenix 4.2(previous 4.2 Phoenix version may have issues on local >>> index). There is a test case MutableIndexReplicationIT where you can see >>> some details. Ideally Phoenix should provide a customer replication sink so >>> that a user doesn't have to setup replication on index table. >>> >>> From: Jean-Marc Spaggiari <[email protected]> >>> Reply-To: <[email protected]> >>> Date: Monday, December 8, 2014 at 9:29 AM >>> To: user <[email protected]> >>> Subject: Replication? >>> >>> Hi, >>> >>> How do we replicate data between 2 cluster when Phoenix is in the >>> picture? >>> >>> Can we simply replicate the table we want from A to B and on cluster B >>> Phoenix will do the required re-indexing? Or should we also replicate the >>> Phoenix tables too? >>> >>> Thanks, >>> >>> JM >>> >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity >>> to which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader of >>> this message is not the intended recipient, you are hereby notified that any >>> printing, copying, dissemination, distribution, disclosure or forwarding of >>> this communication is strictly prohibited. If you have received this >>> communication in error, please contact the sender immediately and delete it >>> from your system. Thank You. >> >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) > >
