Re: Ok to share ZK nodes with Hadoop nodes?

2010-03-08 Thread Ted Dunning
I have used 5 and 3 in different clusters.  Moderate amounts of sharing is
reasonable, but sharing with less intensive applications is definitely
better.  Sharing with the job tracker, for instance is likely fine since it
doesn't abuse disk so much.  The namenode is similar, but not quite as
nice.  Sharing with task only nodes is better than sharing with data nodes.

If your hadoop cluster is 10 machines, this is probably pretty serious
overhead.  If it is 200 machines, it is much less so.

If you are running in EC2, then spawning 3 extra small instances is not a
big deal.

For the record, we share our production ZK machines with other tasks, but
not with map-reduce related tasks and not with our production search
engines.

On Mon, Mar 8, 2010 at 11:21 AM, Patrick Hunt  wrote:

> Best practice for "on-line production serving" is 5 dedicated hosts with
> "shared nothing", physically distributed thoughout the data center (5 hosts
> in a rack might not be the best idea for super reliability). There's alot of
> lee-way though, many ppl run with 3 and spof on switch for example.
>


Re: Ok to share ZK nodes with Hadoop nodes?

2010-03-08 Thread David Rosenstrauch

On 03/08/2010 02:21 PM, Patrick Hunt wrote:

See the troubleshooting page, some apropos detail there (esp relative to
virtual env).

http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting

ZK servers are sensitive to IO (disk/network) latency. As long as you
aren't very sensitive latency requirements it should be fine. If the
machine were to swap for example, or the JVM were to go into long term
GC (visualization in particular kills jvm gc) that would be bad.

Best practice for "on-line production serving" is 5 dedicated hosts with
"shared nothing", physically distributed thoughout the data center (5
hosts in a rack might not be the best idea for super reliability).
There's alot of lee-way though, many ppl run with 3 and spof on switch
for example.

Patrick


Thanks much for the advice, Patrick.  (And Mahadev.)

DR


Re: Ok to share ZK nodes with Hadoop nodes?

2010-03-08 Thread Patrick Hunt
See the troubleshooting page, some apropos detail there (esp relative to 
virtual env).


http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting

ZK servers are sensitive to IO (disk/network) latency. As long as you 
aren't very sensitive latency requirements it should be fine. If the 
machine were to swap for example, or the JVM were to go into long term 
GC (visualization in particular kills jvm gc) that would be bad.


Best practice for "on-line production serving" is 5 dedicated hosts with 
"shared nothing", physically distributed thoughout the data center (5 
hosts in a rack might not be the best idea for super reliability). 
There's alot of lee-way though, many ppl run with 3 and spof on switch 
for example.


Patrick

David Rosenstrauch wrote:
I'm contemplating an upcoming zookeeper rollout and was wondering what 
the zookeeper brain trust here thought about a network deployment question:


Is it generally considered bad practice to just deploy zookeeper on our 
existing hdfs/MR nodes?  Or is it better to run zookeeper instances on 
their own dedicated nodes?


On the one hand, we're not going to be making heavy-duty use of 
zookeeper, so it might be sufficient for zookeeper nodes to share box 
resources with HDFS & MR.  On the other hand, though, I don't want 
zookeeper to become unavailable if the nodes are running a resource 
intensive job that's hogging CPU or network.



What's generally considered best practice for Zookeeper?

Thanks,

DR


Re: Ok to share ZK nodes with Hadoop nodes?

2010-03-08 Thread Mahadev Konar
Hi David,
  Sharing the cluster with HDFS and Map reduce might cause significant
problems. Mapreduce is very IO intensive and this might cause lot of
unnecessary hiccups in your cluster. I would suggest atleast providing
something like this, if you really want to share the nodes.

- atleast considerable amount of memory space say 400-500MB (depending on
your usage) for the java heap
- one dedicated disk not used by MR or Datanodes, so that ZooKeeper
performance is a little predictable for you.

Thanks
mahadev


On 3/8/10 10:58 AM, "David Rosenstrauch"  wrote:

> I'm contemplating an upcoming zookeeper rollout and was wondering what
> the zookeeper brain trust here thought about a network deployment question:
> 
> Is it generally considered bad practice to just deploy zookeeper on our
> existing hdfs/MR nodes?  Or is it better to run zookeeper instances on
> their own dedicated nodes?
> 
> On the one hand, we're not going to be making heavy-duty use of
> zookeeper, so it might be sufficient for zookeeper nodes to share box
> resources with HDFS & MR.  On the other hand, though, I don't want
> zookeeper to become unavailable if the nodes are running a resource
> intensive job that's hogging CPU or network.
> 
> 
> What's generally considered best practice for Zookeeper?
> 
> Thanks,
> 
> DR



Ok to share ZK nodes with Hadoop nodes?

2010-03-08 Thread David Rosenstrauch
I'm contemplating an upcoming zookeeper rollout and was wondering what 
the zookeeper brain trust here thought about a network deployment question:


Is it generally considered bad practice to just deploy zookeeper on our 
existing hdfs/MR nodes?  Or is it better to run zookeeper instances on 
their own dedicated nodes?


On the one hand, we're not going to be making heavy-duty use of 
zookeeper, so it might be sufficient for zookeeper nodes to share box 
resources with HDFS & MR.  On the other hand, though, I don't want 
zookeeper to become unavailable if the nodes are running a resource 
intensive job that's hogging CPU or network.



What's generally considered best practice for Zookeeper?

Thanks,

DR