Thanks a lot Barry and others,
You are right some info is missingin my previous mails. Let me detail a bit
further:
- We have both PARTITIONED ANDREPLICATION regions (both of them are
PERSISTENT) -> We have both paralleland serial senders with overflow to disk
- We are using off-heap
- We are using PDX serialization
- Our default setup is an active /active setup with some data stickiness (In
normal circumstances the same datais handled always in the same Geode
instance). An instance is taking all thetraffic under network split between WAN
instances or failure of a Geode cluster.
- We have custom conflict resolution,to minimize consistency issues.
I was checking your proposal and Ihave several comments / questions, so any
feedback is really appreciated:
- When having multiple regions, theprocesses should be repeated for each
region, and senders should be startedwhen all regions "have finished" to ingest
data and consume events,right?
- I am not sure if this approachscales well when clusters are big. We were
thinking more on an export data /transfer / import data approach. I am not 100
% sure what is best. We will dosome testing and we can find the best option.
Your approach has the benefitthat time in which events are duplicated is much
more reduced and I think thatcould avoid potential consistency issues.
Thanks,
/Evaristo
En jueves, 24 de octubre de 2019 23:47:03 CEST, Barry Oglesby
<[email protected]> escribió:
You could look into a blue-green-type strategy to re-populate the second WAN
site.
This idea uses a Geode durable client that is connected to both sites. It
connects to site1 using CQ and site2 using a proxy region. It basically takes
initial results and events from site1 and puts them into site2.
If you're in the state where site1 is up and site2 is down, then here are steps:
1. Stop gateway sender in site1 so that no events are queued for site2. You can
use gfsh stop gateway-sender to do this.
2. Restart locators and servers in site2
3. Stop gateway sender in site2 so that events from the durable client are not
sent back to site1. You can use gfsh stop gateway-sender to do this.
At this point, the two sites are not really connected by the WAN.
4. Start durable client (set durable-client-id=migration-client)
This:
- creates a CQ connected to site1
- executes the CQ with initial results
- adds those results to site2 using the proxy region
- sends ready for events which starts the events flowing to the
MigrationListener. Events received by the MigrationListener are added to site2
using the proxy region.
When steady state is achieved (meaning all the initial results are processed
and only the MigrationListener is processing events):
5. Restart gateway sender in site1
6. Stop durable client
After you restart the gateway sender in site1 but before you stop the durable
client, both will be sending events to site2. This will result in duplicate
events in site2, so the shorter the time between these actions, the fewer
duplicate events.
7. After the durable client has been stopped, restart the gateway sender in
site2.
Notes / Caveats:
I attached the MigrationClient, MigrationListener and configuration files.
If you're using PDX serialization, you might have to work around JIRA
GEODE-6271:
https://issues.apache.org/jira/browse/GEODE-6271
The MigrationClient does this in registerPdxTypesOnAllPools. If you're not
using PDX serialization, you can remove this code.
You don't mention if any of your entities are persistent.
If your PdxTypes are persistent in site2, you won't need to work around JIRA
GEODE-6271
If your senders are persistent, you may need to delete the disk files before
restarting the senders.
Thanks,Barry Oglesby
On Wed, Oct 23, 2019 at 10:36 PM [email protected]
<[email protected]> wrote:
Thanks a lot. We Will try this
Enviado desde Yahoo Mail con Android
El mié., oct. 23, 2019 a 23:35, Jason Huynh<[email protected]> escribió: Hi
Evaristo,
I spoke with another committer, Anil, and from what we understand, this process
that is described would work. I am not sure if this it the recommended way to
do a restart but we believe the steps outlined would get the intended outcome.
To clear a Serial gateway, I believe stopping the gateway sender will clear
it's queue. However for a parallel gateway sender I think the parallel queue
gets cleared once the sender is restarted (so a stop and then a start). There
may be other ways such as destroying the gateway sender but you'd probably have
to detach it from the region first.
This sounds like a WAN gii feature would be useful and help reduce the steps in
this use case.
Please chime in if this response is wrong or can be improved.
Thanks,-Jason
On Tue, Oct 22, 2019 at 1:26 PM [email protected]
<[email protected]> wrote:
Hi there,
We are planning to use aninstallation with 2 Geode cluster connected via WAN
and using gateway senders/receiversto keep them updated. Main reason is
resiliency for disasters in a data center.
It is not clear for us how torecover a datacenter in case of disaster. This is
the use case:
- One of the data centers have aproblem (natural catastrophe)
- The other data center keepsrunning traffic and filling the gateway sender
queues that need to be stoppedat some point to avoid filling up the disk
resources.
At some point in time, the datacenter is ready to start recovery that will
require to synchronize the Geodecopy. The procedure should something like:
- Drain gateway service queues incopy providing service
- Start gateway senders
- Make a copy
- Transfer copy to data center thatwill be recovered
- Import the copy
- Allow the data center to catchupup via replication
- Start again the copy.
Does it make sense? Or is there abetter way to do it. In case the answer is
yes, is there any way to draingateway sender’s queues (both for parallel and
serial GWs)
Thanks in advance,
/Evaristo