Re: Multiple data centre in Hadoop

2012-04-19 Thread Edward Capriolo
Hive is beginning to implement Region support where one metastore will manage multiple filesystems and jobtrackers. When a query creates a table it will then be copied to one ore more datacenters. In addition the query planner will intelligently attempt to run queries in regions only where all the

Re: Multiple data centre in Hadoop

2012-04-19 Thread Robert Evans
If you want to start an open source project for this I am sure that there are others with the same problem that might be very wiling to help out. :) --Bobby Evans On 4/19/12 4:31 PM, "Michael Segel" wrote: I don't know of any open source solution in doing this... And yeah its something one can

Re: Multiple data centre in Hadoop

2012-04-19 Thread Michael Segel
I don't know of any open source solution in doing this... And yeah its something one can't talk about ;-) On Apr 19, 2012, at 4:28 PM, Robert Evans wrote: > Where I work we have done some things like this, but none of them are open > source, and I have not really been directly involved w

Re: Multiple data centre in Hadoop

2012-04-19 Thread Robert Evans
Where I work we have done some things like this, but none of them are open source, and I have not really been directly involved with the details of it. I can guess about what it would take, but that is all it would be at this point. --Bobby On 4/17/12 5:46 PM, "Abhishek Pratap Singh" wrote:

Re: Multiple data centre in Hadoop

2012-04-17 Thread Abhishek Pratap Singh
Thanks bobby, I m looking for something like this. Now the question is what is the best strategy to do Hot/Hot or Hot/Warm. I need to consider the CPU and Network bandwidth, also needs to decide from which layer this replication should start. Regards, Abhishek On Mon, Apr 16, 2012 at 7:08 AM,

Re: Multiple data centre in Hadoop

2012-04-16 Thread Robert Evans
Hi Abhishek, Manu is correct about High Availability within a single colo. I realize that in some cases you have to have fail over between colos. I am not aware of any turn key solution for things like that, but generally what you want to do is to run two clusters, one in each colo, either ho

Re: Multiple data centre in Hadoop

2012-04-12 Thread Manu S
Hi Abhishek, 1. Use multiple directories for *dfs.name.dir* & *dfs.data.dir* etc * Recommendation: write to *two local directories on different physical volumes*, and to an *NFS-mounted* directory – Data will be preserved even in the event of a total failure of the NameNode machines * Recommendati

Re: Multiple data centre in Hadoop

2012-04-11 Thread Abhishek Pratap Singh
Thanks Robert. Is there a best practice or design than can address the High Availability to certain extent? ~Abhishek On Wed, Apr 11, 2012 at 12:32 PM, Robert Evans wrote: > No it does not. Sorry > > > On 4/11/12 1:44 PM, "Abhishek Pratap Singh" wrote: > > Hi All, > > Just wanted if hadoop sup

Re: Multiple data centre in Hadoop

2012-04-11 Thread Robert Evans
No it does not. Sorry On 4/11/12 1:44 PM, "Abhishek Pratap Singh" wrote: Hi All, Just wanted if hadoop supports more than one data centre. This is basically for DR purposes and High Availability where one centre goes down other can bring up. Regards, Abhishek