I see the VM approach great for isolation, customized hadoop or tools
required by the jobs and ease of IT management.
Performance hit on CPU and IO is there but I never looked at the
numbers.
Anybody did?
Basically for now, on EC2 for instance, if you need to go faster, just
buy 50 more machines
The MR jobs I'm performing are not CPU intensive, so I've always assumed
that they're more IO bound. Maybe that's an exceptional situation, but I'm
not really sure.
A good motherboard with a local IO channel per disk, feeding individual
cores, with memory partitioned up between them... and I've
There is a place for virtualization. If you can justify the overhead
for hands free network management. If your systems really have enough
power they can run hadoop and something else. If you need to run
multiple truly isolated versions of hadoop (selling some type of
hadoop grid services?)
If you
I once asked a wise man in change of a rather large multi-datacenter
service, "Have you every considered virtualization?" He replied, "All
the CPU's here are pegged at 100%"
They may be applications for this type of processing. I have thought
about systems like this from time to time. This thinkin
Hi Colin,
I think this would work as a lab setup for testing how hadoop handles
hardware failures but I was thinking of the opposite on a much larger
scale.
It is though by many IT individuals that its smart to take one high
end system and split into multiple machines, though few people think
of
Hi!
I just wondered what semantics I can rely on concerning reducing:
-) All key/value pairs with a given key end up in the same reducer.
-) What I now wonder, do all key/value pairs for a given key end up in one
sequence?
So basically, do reducers get something like file-a or file-b?
file-a:
I've wondered about this using single or dual quad-core machines with one
spindle per core, and partitioning them out into 2, 4, 8, whatever virtual
machines, possibly marking each physical box as a "rack".
There would be some initial and ongoing sysadmin costs. But could this
increase thoughput
Hello Everyone,
I've been brainstorming recently and its always been in the back of my
mind, hadoop offers the functionality of clustering comodity systems
together, but how would one go about virtualising them apart again?
Kind Regards
Brad :)
Well, the message says it clearly :-P
Some basic Unix knowledge that might help:
-) Permissions in unix are (r)ead/(w)rite/e(x)ecute for the user
(owner)/group/world.
-) for historic reasons, permissions are often given as octal numbers as in
this case. 0644 means rw for owner, r for group and