Re: Hadoop over the internet

2010-04-17 Thread Nick Jones
I think the biggest issue would be upstream bandwidth and latency. If the thought was to use a Seti type approach, most users wouldn't have the necessary upstream bandwidth to support the DFS. It would be likely that a few local desktop machines would significantly out pace a much larger DSL/cabl

Re: Hadoop over the internet

2010-04-17 Thread Eric Sammer
This is likely to fail, yes. The reason why is because you'll almost certainly encounter timeouts in the heartbeats between data nodes and the name node and the task trackers and job tracker. Also, Hadoop uses pipe line replication between data nodes (client -> DN1 -> DN2 -> ...) which will also en

Re: Hadoop over the internet

2010-04-17 Thread Juwei Shi
I think the original assumption of google's implementation (also hadoop's) of map/reduce is in-house clusters. 2010/4/17 > Hello, > > I want to investigate the matter of running hadoop MapReduce jobs over the > Internet. I don't mean in private computers, all of them in different > places, rathe

Hadoop over the internet

2010-04-17 Thread altanis
Hello, I want to investigate the matter of running hadoop MapReduce jobs over the Internet. I don't mean in private computers, all of them in different places, rather a collection of datacenters, connected to each other over the Internet. Would that fail? If yes, how and why? What issues would ar