Re: Possible DR solution

2016-11-12 Thread Mich Talebzadeh
Hi, I meant the way Wandisco does replication. Streaming blocks of data one after another. You are correct that temporary directories need not be replicated. One of their point is that one can replicate a cluster from say NY to Singapore. I much doubt if that is doable given the volume of data

Re: Possible DR solution

2016-11-12 Thread Mich Talebzadeh
Thanks for the links. The difficulty with building DR for HDFS is the distributed nature of HDFS. If each DataNode had a mirror copy in DR via something similar to SRDF (assuming NameNode and others taken care of), then there would not be an issue. The fail-over would be starting the mirror HDFS

Re: Possible DR solution

2016-11-12 Thread deepak.subhramanian
Sent from my Samsung Galaxy smartphone. Original message From: Timur Shenkao <t...@timshenkao.su> Date: 12/11/2016 09:17 (GMT-08:00) To: Mich Talebzadeh <mich.talebza...@gmail.com>, user@spark.apache.org Subject: Re: Possible DR solution Hi guys! 1) Thoug

Re: Possible DR solution

2016-11-12 Thread Timur Shenkao
Hi guys! 1) Though it's quite interesting, I believe that this discussion is not about Spark :) 2) If you are interested, there is solution by Cloudera https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cm_bdr_replication_intro.html (requires that *source cluster* has Cloudera

Re: Possible DR solution

2016-11-12 Thread Mich Talebzadeh
Thanks Jorn. The way WanDisco promotes itself is doing block level replication. as I understand you modify core-file.xml and add couple of network server locations there. they call this tool Fusion. there are at least 2 fusion servers for high availability. each one among other things has a

Re: Possible DR solution

2016-11-12 Thread Jörn Franke
What is wrong with the good old batch transfer for transferring data from a cluster to another? I assume your use case is only business continuity in case of disasters such as data center loss, which are unlikely to happen (well it does not mean they do not happen) and where you could afford to

Re: Possible DR solution

2016-11-12 Thread Mich Talebzadeh
thanks Vince can you provide more details on this pls Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com

Re: Possible DR solution

2016-11-12 Thread vincent gromakowski
A Hdfs tiering policy with good tags should be similar Le 11 nov. 2016 11:19 PM, "Mich Talebzadeh" a écrit : > I really don't see why one wants to set up streaming replication unless > for situations where similar functionality to transactional databases is > required

Re: Possible DR solution

2016-11-11 Thread Mich Talebzadeh
I really don't see why one wants to set up streaming replication unless for situations where similar functionality to transactional databases is required in big data? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Possible DR solution

2016-11-11 Thread Mich Talebzadeh
I think it differs as it starts streaming data through its own port as soon as the first block is landed. so the granularity is a block. however, think of it as oracle golden gate replication or sap replication for databases. the only difference is that if the corruption in the block with hdfs it

Re: Possible DR solution

2016-11-11 Thread Mich Talebzadeh
reason being ? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any

Re: Possible DR solution

2016-11-11 Thread Deepak Sharma
Reason being you can set up hdfs duplication on your own to some other cluster . On Nov 11, 2016 22:42, "Mich Talebzadeh" wrote: > reason being ? > > Dr Mich Talebzadeh > > > > LinkedIn * >

Re: Possible DR solution

2016-11-11 Thread Deepak Sharma
This is waste of money I guess. On Nov 11, 2016 22:41, "Mich Talebzadeh" wrote: > starts at $4,000 per node per year all inclusive. > > With discount it can be halved but we are talking a node itself so if you > have 5 nodes in primary and 5 nodes in DR we are talking

Re: Possible DR solution

2016-11-11 Thread Mich Talebzadeh
starts at $4,000 per node per year all inclusive. With discount it can be halved but we are talking a node itself so if you have 5 nodes in primary and 5 nodes in DR we are talking about $40K already. HTH Dr Mich Talebzadeh LinkedIn *

RE: Possible DR solution

2016-11-11 Thread Mudit Kumar
Is it feasible cost wise? Thanks, Mudit From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Friday, November 11, 2016 2:56 PM To: user @spark Subject: Possible DR solution Hi, Has anyone had experience of using WanDisco block replication to create a fault