Currently there is no relation betweeen weak consistency and hadoop. I just 
spent more time thinking about the requirement (as outlined below)     a) 
Maintain total of 3 data centers     b) Maintain 1 copy per data center     c) 
If any data center goes down, dont create additional copies.  
Above is not a valid model, especially requirement (c).  Because this will take 
away "Strong Consistency" model supported by Hadoop. Hope this explains. 
I believe we can give up on requirement (c). I more currently exploring to see 
whether anyway to achieve (a) and (b). Requirement (b) can also be relaxed to 
have more copies per data center if needed 
From: rahul.rec....@gmail.com
Date: Wed, 4 Sep 2013 10:04:49 +0530
Subject: Re: Multidata center support
To: user@hadoop.apache.org

Under replicated blocks are also consistent from a consumers point. Care of 
explain the relation to weak consistency to hadoop.



Thanks,
Rahul


On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee <rahul.rec....@gmail.com> 
wrote:


Adam's response makes more sense to me to offline replicate generated data from 
one cluster to another across data centers.




Not sure if configurable block placement block placement policy is supported in 
Hadoop.If yes , then alone side with rack awareness , you should be able to 
achieve the same.




I could not follow your question related to weak consistency.


Thanks,
Rahul






On Wed, Sep 4, 2013 at 2:20 AM, Baskar Duraikannu 
<baskar.duraika...@outlook.com> wrote:






RahulAre you talking about rack-awareness script? 
I did go through rack awareness. Here are the problems with rack awareness 
w.r.to my (given) "business requirment"



1.  Hadoop , default places two copies on the same rack and 1 copy on some 
other rack.  This would work as long as we have two data centers. if business 
wants to have three data centers, then data would not be spread across. 
Separately there is a question around whether it is the right thing to do or 
not. I have been promised by business that they would buy enough bandwidth such 
that each data center will be few milliseconds apart (in latency).



2. I believe Hadoop automatically re-replicates data if one or more node is 
down. Assume when one out of 2 data center goes down. There will be a massive 
data flow to create additional copies.  When I say data center support, I 
should be able to configure hadoop to say 


     a) Maintain 1 copy per data center     b) If any data center goes down, 
dont create additional copies.  
Above requirements that I am pointing will essentially move hadoop from 
strongly consistent to a week/eventual consistent model. Since this changes 
fundamental architecture, it will probably break all sort of things... Might 
not be possible ever in Hadoop. 



Thoughts? 
SadakIs there a way to implement above requirement via Federation? 
ThanksBaskar

Date: Sun, 1 Sep 2013 00:20:04 +0530



Subject: Re: Multidata center support
From: visioner.sa...@gmail.com
To: user@hadoop.apache.org




What do you think friends I think hadoop clusters can run on multiple data 
centers using FEDERATION

On Sat, Aug 31, 2013 at 8:39 PM, Visioner Sadak <visioner.sa...@gmail.com> 
wrote:




The only problem i guess hadoop wont be able to duplicate data from one data 
center to another but i guess i can identify data nodes or namenodes from 
another data center correct me if i am wrong






On Sat, Aug 31, 2013 at 7:00 PM, Visioner Sadak <visioner.sa...@gmail.com> 
wrote:





lets say that 
you have some machines in europe and some  in US I think you just need the ips 
and configure them in your cluster set upit will work...



On Sat, Aug 31, 2013 at 7:52 AM, Jun Ping Du <j...@vmware.com> wrote:






Hi,    Although you can set datacenter layer on your network topology, it is 
never enabled in hadoop as lacking of replica placement and task scheduling 
support. There are some work to add layers other than rack and node under 
HADOOP-8848 but may not suit for your case. Agree with Adam that a cluster 
spanning multiple data centers seems not make sense even for DR case. Do you 
have other cases to do such a deployment?






Thanks,
Junping
From: "Adam Muise" <amu...@hortonworks.com>






To: user@hadoop.apache.org
Sent: Friday, August 30, 2013 6:26:54 PM
Subject: Re: Multidata center support


Nothing has changed. DR best practice is still one (or more) clusters per site 
and replication is handled via distributed copy or some variation of it. A 
cluster spanning multiple data centers is a poor idea right now.










On Fri, Aug 30, 2013 at 12:35 AM, Rahul Bhattacharjee <rahul.rec....@gmail.com> 
wrote:







My take on this.





Why hadoop has to know about data center thing. I think it can be installed 
across multiple data centers , however topology configuration would be required 
to tell which node belongs to which data center and switch for block placement.










Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu 
<baskar.duraika...@outlook.com> wrote:












We have a need to setup hadoop across data centers.  Does hadoop support multi 
data center configuration? I searched through archives and have found that 
hadoop did not support multi data center configuration some time back. Just 
wanted to see whether situation has changed.









Please help.                                      




-- 









Adam MuiseSolution EngineerHortonworks






amuise@hortonworks.com416-417-4037







Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.








Hortonworks Virtual Sandbox








Hadoop: Disruptive Possibilities by Jeff Needham











CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the 
individual or entity to which it is addressed and may contain information that 
is confidential, privileged and exempt from disclosure under applicable law. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have received 
this communication in error, please contact the sender immediately and delete 
it from your system. Thank You.











                                          



                                          

Reply via email to