Hello.
Am 14.09.2012 um 08:03 schrieb Jason Yang lin.yang.ja...@gmail.com:
I have a question about how does the pseudo-distributed Hadoop cluster work:
As many map tasks are submitted to the pseudo-distributed Hadoop cluster,
does the hadoop run each mapper in sequence ? or does it run
Hey, Kai
Thanks for you reply.
I was wondering what's difference btw the pseudo-distributed and
fully-distributed hadoop, except the maximum number of map/reduce.
And if a MR program works fine in pseudo-distributed cluster, will it work
exactly fine in the fully-distributed cluster ?
Hi Jason,
I think you're confusing the standalone mode with a pseudo-distributed
mode. The former is a limited mode of MR where no daemons need to be
deployed and the tasks run in a single JVM (via threads).
A pseudo distributed cluster is a cluster where all daemons are
running on one node
The only difference between pseudo-distributed and fully distributed would
be scale. You could say that code that runs fine on the former, runs fine
too on the latter. But it does not necessary mean that the performance will
scale the same way (ie if you keep a list of elements in memory, at
All right, I got it.
Thanks for all of you.
2012/9/14 Bertrand Dechoux decho...@gmail.com
The only difference between pseudo-distributed and fully distributed would
be scale. You could say that code that runs fine on the former, runs fine
too on the latter. But it does not necessary mean
One thing to be careful about is paths of dependent libraries or
executables like streaming binaries. In pseudo distributed mode, since all
processes are looking on the same machine, it is likely that they will find
paths that are really local to only the machine where the job is being
launched