I would consider the timeframe that you are looking for to determine if you 
should focus on Hadoop 2.x (with YARN) or older. 2.x should scale much better 
than 1.x. 

Keep in mind that 2.x was only "officially" released late last year. 

Marco

> On May 22, 2014, at 5:17 PM, Sylvain Gault <sylvain.ga...@inria.fr> wrote:
> 
> Hello,
> 
> I'm new to this mailing list, so forgive me if I don't do everything
> right.
> 
> I didn't know whether I should ask on this mailing list or on
> mapreduce-dev or on yarn-dev. So I'll just start there. ^^
> 
> Short story: I'm looking for some paper(s) studying the scalability
> of Hadoop MapReduce. And I found this extremely difficult to find on
> google scholar. Do you have something worth citing in a PhD thesis?
> 
> Long story: I'm writing my PhD thesis about MapReduce and when I talk
> about Hadoop I'd like to say "how much it scales". I heared two years
> ago some people say that "Yahoo! got it scale up to 4000 nodes and plan
> to try on 6000 nodes" or something like that. I also heared that
> YARN/MRv2 should scale better, but I don't plan to talk much about
> YARN/MRv2. So I'd take anything I could cite as a reference in my
> manuscript. :)
> 
> 
> Best regards,
> Sylvain Gault

Reply via email to