Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-26 Thread Christian Dahlqvist
Hi, A common approach for replicating changes across multiple geographically distributed clusters if to put a message queue in front of Elasticsearch and feed all data modifications through this so that they can be applied to the clusters independently. This allows issues with unreliable connec

Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-25 Thread joergpra...@gmail.com
The IndexShardSnapshotAndRestoreService is from the snapshot/restore feature. It allows to push snapshots to a shared file storage, and the restore allows to retrieve snapshots and place them into the current cluster. By snapshot/restore, an "off-line" synchronization utility already exists. The i

Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-23 Thread Todd Nine
Thanks for the suggestion on the tribe nodes. I'll take a look at org.elasticsearch.index.snapshots.IndexShardSnapshotAndRestoreService more in depth. A reference implementation would be helpful in understanding it's usage, do you happen to know of any projects that use it? >From an archi

Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-23 Thread joergpra...@gmail.com
This looks promising. For admin operations, see also the tribe node. A special "replication-aware tribe node" (or maybe more than one tribe node for resiliency) could supervise the cluster-to-cluster replication. For the segment strategy, I think it is hard to go down to the level of the index st

Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-23 Thread Todd Nine
Thanks for the pointers Jorg, We use Rx Java in our current application, so I'm familiar with backpressure and ensuring we don't overwhelm target systems. I've been mulling over the high level design a bit more. A common approach in all systems that perform multi region replication is the co

Re: Help creating a near real time streaming plugin to perform replication between clusters

2015-01-15 Thread joergpra...@gmail.com
While it seems quite easy to attach listeners to an ES node to capture operations in translog-style and push out index/delete operations on shard level somehow, there will be more to consider for a reliable solution. The Couchbase developers have added a data replication protocol to their product

Help creating a near real time streaming plugin to perform replication between clusters

2015-01-15 Thread Todd Nine
Hey all, I would like to create a plugin, and I need a hand. Below are the requirements I have. - Our documents are immutable. They are only ever created or deleted, updates do not apply. - We want mirrors of our ES cluster in multiple AWS regions. This way if the WAN between