RE: Hadoop Distributed Virtualisation

2008-06-06 Thread jeremy.huylebroeck
I see the VM approach great for isolation, customized hadoop or tools required by the jobs and ease of IT management. Performance hit on CPU and IO is there but I never looked at the numbers. Anybody did? Basically for now, on EC2 for instance, if you need to go faster, just buy 50 more machines

Re: Hadoop Distributed Virtualisation

2008-06-06 Thread Colin Freas
The MR jobs I'm performing are not CPU intensive, so I've always assumed that they're more IO bound. Maybe that's an exceptional situation, but I'm not really sure. A good motherboard with a local IO channel per disk, feeding individual cores, with memory partitioned up between them... and I've

Re: Hadoop Distributed Virtualisation

2008-06-06 Thread Edward Capriolo
There is a place for virtualization. If you can justify the overhead for hands free network management. If your systems really have enough power they can run hadoop and something else. If you need to run multiple truly isolated versions of hadoop (selling some type of hadoop grid services?) If you

Re: Hadoop Distributed Virtualisation

2008-06-06 Thread Edward Capriolo
I once asked a wise man in change of a rather large multi-datacenter service, "Have you every considered virtualization?" He replied, "All the CPU's here are pegged at 100%" They may be applications for this type of processing. I have thought about systems like this from time to time. This thinkin

Re: Hadoop Distributed Virtualisation

2008-06-06 Thread Brad C
Hi Colin, I think this would work as a lab setup for testing how hadoop handles hardware failures but I was thinking of the opposite on a much larger scale. It is though by many IT individuals that its smart to take one high end system and split into multiple machines, though few people think of

key assignment to reducers

2008-06-06 Thread Andreas Kostyrka
Hi! I just wondered what semantics I can rely on concerning reducing: -) All key/value pairs with a given key end up in the same reducer. -) What I now wonder, do all key/value pairs for a given key end up in one sequence? So basically, do reducers get something like file-a or file-b? file-a:

Re: Hadoop Distributed Virtualisation

2008-06-06 Thread Colin Freas
I've wondered about this using single or dual quad-core machines with one spindle per core, and partitioning them out into 2, 4, 8, whatever virtual machines, possibly marking each physical box as a "rack". There would be some initial and ongoing sysadmin costs. But could this increase thoughput

Hadoop Distributed Virtualisation

2008-06-06 Thread Brad C
Hello Everyone, I've been brainstorming recently and its always been in the back of my mind, hadoop offers the functionality of clustering comodity systems together, but how would one go about virtualising them apart again? Kind Regards Brad :)

Re: Problem logging into the master node in Amazon EC2

2008-06-06 Thread Andreas Kostyrka
Well, the message says it clearly :-P Some basic Unix knowledge that might help: -) Permissions in unix are (r)ead/(w)rite/e(x)ecute for the user (owner)/group/world. -) for historic reasons, permissions are often given as octal numbers as in this case. 0644 means rw for owner, r for group and