Hello, I'm new to this mailing list, so forgive me if I don't do everything right.
I didn't know whether I should ask on this mailing list or on mapreduce-dev or on yarn-dev. So I'll just start there. ^^ Short story: I'm looking for some paper(s) studying the scalability of Hadoop MapReduce. And I found this extremely difficult to find on google scholar. Do you have something worth citing in a PhD thesis? Long story: I'm writing my PhD thesis about MapReduce and when I talk about Hadoop I'd like to say "how much it scales". I heared two years ago some people say that "Yahoo! got it scale up to 4000 nodes and plan to try on 6000 nodes" or something like that. I also heared that YARN/MRv2 should scale better, but I don't plan to talk much about YARN/MRv2. So I'd take anything I could cite as a reference in my manuscript. :) Best regards, Sylvain Gault