Any projects to help with running MapReduce across physically distributed clusters?

2010-11-03 Thread Jason Smith
I am looking into the problem of running jobs to generate statistics across a large data set that would be split into different clusters geographically. Each cluster would have a unique piece of the overall data set, as the network overhead to collocate the data would be too much. I tried searchin

Re: Any projects to help with running MapReduce across physically distributed clusters?

2010-11-03 Thread Chris K Wensel
You could easily write Cascading apps that could pull all the data into a single source and perform the processing. You could also use it to launch jobs in different clusters from a single application (each Flow can be given unique properties causing it to run mr jobs on arbitrary clusters).