Re: ZK on EC2
Several of our search engines use pretty large heaps (12-24GB). That means that if they *ever* do a full collection, disaster ensues because it can take so long. That means that we have to use concurrent collectors as much as possible and make sure that the concurrent collectors get all the ephemeral garbage. One server, for instance, uses the following java options: -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution These options give us lots of detail about what is happening in the collections. Most importantly, we need to know that the tenuring distribution never has any significant tail of objects that might survive into the space that will cause a full collection. This is pretty safe in general because our servers either create objects to respond to a single request or create cached items that survive essentially forever. -XX:+UseParNewGC -XX:+UseConcMarkSweepGC Concurrent collectors are critical. We use the hbase recommendations here. -XX:MaxTenuringThreshold=6 -XX:SurvivorRatio=6 Max tenuring threshold is related to what we saw on the tenuring distribution. We very rarely see any objects last 4 collections so we set it so that it would have to last two more collections in order to become tenured. The survivor ratio is related to this and is set based on recommendations for non-stop, low latency servers. -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly CMS collections have couple of ways to be triggered. We limit it to a single way to make the world simpler. Again, this is taken from outside recommendations from the hbase guys and other commentors on the web. -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC I doubt that these are important. It is always nice to get more information and I want to avoid any possibility of some library triggering a huge collection. -XX:ParallelGCThreads=8 If the parallel GC needs horsepower, I want it to get it. -Xdebug Very rarely useful, but a royal pain if not installed. I don't know if it has a performance impact (I think not). -Xms8000m -Xmx8000m Setting the minimum heap helps avoid full GC's during the early life of the server. On Tue, Nov 10, 2009 at 11:27 AM, Patrick Hunt wrote: > Can you elaborate on "gc tuning" - you are using the incremental collector? > > Patrick > > > Ted Dunning wrote: > >> The server side is a fairly standard (but old) config: >> >> tickTime=2000 >> dataDir=/home/zookeeper/ >> clientPort=2181 >> initLimit=5 >> syncLimit=2 >> >> Most of our clients now use 5 seconds as the timeout, but I think that we >> went to longer timeouts in the past. Without digging in to determine the >> truth of the matter, my guess is that we needed the longer timeouts before >> we tuned the GC parameters and that after tuning GC, we were able to >> return >> to a more reasonable timeout. In retrospect, I think that we blamed EC2 >> for >> some of our own GC misconfiguration. >> >> I would not use our configuration here as canonical since we didn't apply >> a >> whole lot of brainpower to this problem. >> >> On Tue, Nov 10, 2009 at 9:29 AM, Patrick Hunt wrote: >> >> Ted, could you provide your configuration information for the cluster >>> (incl >>> the client timeout you use), if you're willing I'd be happy to put this >>> up >>> on the wiki for others interested in running in EC2. >>> >>> >> >> >> -- Ted Dunning, CTO DeepDyve
Re: ZK on EC2
Can you elaborate on "gc tuning" - you are using the incremental collector? Patrick Ted Dunning wrote: The server side is a fairly standard (but old) config: tickTime=2000 dataDir=/home/zookeeper/ clientPort=2181 initLimit=5 syncLimit=2 Most of our clients now use 5 seconds as the timeout, but I think that we went to longer timeouts in the past. Without digging in to determine the truth of the matter, my guess is that we needed the longer timeouts before we tuned the GC parameters and that after tuning GC, we were able to return to a more reasonable timeout. In retrospect, I think that we blamed EC2 for some of our own GC misconfiguration. I would not use our configuration here as canonical since we didn't apply a whole lot of brainpower to this problem. On Tue, Nov 10, 2009 at 9:29 AM, Patrick Hunt wrote: Ted, could you provide your configuration information for the cluster (incl the client timeout you use), if you're willing I'd be happy to put this up on the wiki for others interested in running in EC2.
Re: ZK on EC2
The server side is a fairly standard (but old) config: tickTime=2000 dataDir=/home/zookeeper/ clientPort=2181 initLimit=5 syncLimit=2 Most of our clients now use 5 seconds as the timeout, but I think that we went to longer timeouts in the past. Without digging in to determine the truth of the matter, my guess is that we needed the longer timeouts before we tuned the GC parameters and that after tuning GC, we were able to return to a more reasonable timeout. In retrospect, I think that we blamed EC2 for some of our own GC misconfiguration. I would not use our configuration here as canonical since we didn't apply a whole lot of brainpower to this problem. On Tue, Nov 10, 2009 at 9:29 AM, Patrick Hunt wrote: > Ted, could you provide your configuration information for the cluster (incl > the client timeout you use), if you're willing I'd be happy to put this up > on the wiki for others interested in running in EC2. > -- Ted Dunning, CTO DeepDyve
Re: ZK on EC2
Ok, good. Based on the comparison of perf numbers, and Ted's experience with large instances on ec2 running zk, this makes sense to me. A large is about half (very roughly) the horsepower of what I was using for my tests. Ted mentioned that he hasn't seen issues on ec2 running with large instances and that correlates to these numbers (again, this is all rough back of the envelope type stuff but good enough imo). Anyone have a small that they could run the same cpu/disk/network tests? I'd be interested to see how that stacks up. Ted, could you provide your configuration information for the cluster (incl the client timeout you use), if you're willing I'd be happy to put this up on the wiki for others interested in running in EC2. Thanks! Patrick Ted Dunning wrote: I only have one large instance live. My impression from previously is that between host bandwidth is generally about what you saw. We have been able to sustain 20-30MB/s into EC2 to a single node which should be harder than moving data between nodes. I have heard rumors that others were able to get double what I got for incoming transfer. On Mon, Nov 9, 2009 at 9:47 PM, Patrick Hunt wrote: Could you test networking - scping data between hosts? (I was seeing 64.1MB/s for a 512mb file - the one created by dd, random data)
Re: ZK on EC2
I only have one large instance live. My impression from previously is that between host bandwidth is generally about what you saw. We have been able to sustain 20-30MB/s into EC2 to a single node which should be harder than moving data between nodes. I have heard rumors that others were able to get double what I got for incoming transfer. On Mon, Nov 9, 2009 at 9:47 PM, Patrick Hunt wrote: > Could you test networking - scping data between hosts? (I was seeing > 64.1MB/s for a 512mb file - the one created by dd, random data) > -- Ted Dunning, CTO DeepDyve
Re: ZK on EC2
Interesting, so comparing a large (4cores and "high" i/o performance) ec2 instance (the first number on each line below) vs the host I used in the latency test (the second number on each line): ebs cache817 vs 11532 ~ 7% (ec2 7% as performant) ebs bufread 53 vs88 ~ 60% native cache 829 vs 11532 ~ 7% native bufread80 vs88 ~ 90% dd 512m 106s vs 74s ~ 43% longer for ec2 large md5sum 512m 2.13s vs 1.5 ~ 42% longer Good thing we don't rely on disk cache. ;-) Raw processing power looks about half. Could you test networking - scping data between hosts? (I was seeing 64.1MB/s for a 512mb file - the one created by dd, random data) Small anyone? Patrick Ted Dunning wrote: /dev/sdp is an EBS volume. /dev/sdb is a native volume. This is a large instance. r...@domu-#:~# hdparm -tT /dev/sdp /dev/sdp: Timing cached reads: 1634 MB in 2.00 seconds = 817.30 MB/sec Timing buffered disk reads: 160 MB in 3.00 seconds = 53.27 MB/sec r...@domu-:~# hdparm -tT /dev/sdb /dev/sdb: Timing cached reads: 1658 MB in 2.00 seconds = 829.44 MB/sec Timing buffered disk reads: 242 MB in 3.00 seconds = 80.56 MB/sec r...@domu-:~# time dd if=/dev/urandom bs=512000 of=/tmp/memtest count=1050 1050+0 records in 1050+0 records out 53760 bytes (538 MB) copied, 106.525 s, 5.0 MB/s real1m46.517s user0m0.000s sys1m46.127s r...@domu-:~# time md5sum /tmp/memtest; time md5sum /tmp/memtest; time md5sum /tmp/memtest f79304f68ce04011ca0aebfbd548134a /tmp/memtest real0m2.234s user0m1.613s sys0m0.590s f79304f68ce04011ca0aebfbd548134a /tmp/memtest real0m2.136s user0m1.560s sys0m0.584s f79304f68ce04011ca0aebfbd548134a /tmp/memtest real0m2.123s user0m1.640s sys0m0.481s r...@domu-:~# On Mon, Nov 9, 2009 at 4:54 PM, Patrick Hunt wrote: I'm really interested to know how ec2 compares wrt disk and network performance to what I've documented here under the "hardware" section: http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview#Hardware Is it possible for someone to compare the network and disk performance (scp, dd, md5sum, etc...) that I document in the wiki page on say, EC2 small/large nodes? I'd do it myself but I've not used ec2. If anyone could try these and report I'd appreciate it. Patrick Ted Dunning wrote: Worked pretty well for me. We did extend all of our timeouts. The biggest worry for us was timeouts on the client side. The ZK server side was no problem in that respect. On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao wrote: Has anyone deployed ZK on EC2? What's the experience there? Are there more timeouts, lead re-election, etc? Thanks, Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com
Re: ZK on EC2
/dev/sdp is an EBS volume. /dev/sdb is a native volume. This is a large instance. r...@domu-#:~# hdparm -tT /dev/sdp /dev/sdp: Timing cached reads: 1634 MB in 2.00 seconds = 817.30 MB/sec Timing buffered disk reads: 160 MB in 3.00 seconds = 53.27 MB/sec r...@domu-:~# hdparm -tT /dev/sdb /dev/sdb: Timing cached reads: 1658 MB in 2.00 seconds = 829.44 MB/sec Timing buffered disk reads: 242 MB in 3.00 seconds = 80.56 MB/sec r...@domu-:~# time dd if=/dev/urandom bs=512000 of=/tmp/memtest count=1050 1050+0 records in 1050+0 records out 53760 bytes (538 MB) copied, 106.525 s, 5.0 MB/s real1m46.517s user0m0.000s sys1m46.127s r...@domu-:~# time md5sum /tmp/memtest; time md5sum /tmp/memtest; time md5sum /tmp/memtest f79304f68ce04011ca0aebfbd548134a /tmp/memtest real0m2.234s user0m1.613s sys0m0.590s f79304f68ce04011ca0aebfbd548134a /tmp/memtest real0m2.136s user0m1.560s sys0m0.584s f79304f68ce04011ca0aebfbd548134a /tmp/memtest real0m2.123s user0m1.640s sys0m0.481s r...@domu-:~# On Mon, Nov 9, 2009 at 4:54 PM, Patrick Hunt wrote: > I'm really interested to know how ec2 compares wrt disk and network > performance to what I've documented here under the "hardware" section: > http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview#Hardware > > Is it possible for someone to compare the network and disk performance > (scp, dd, md5sum, etc...) that I document in the wiki page on say, EC2 > small/large nodes? I'd do it myself but I've not used ec2. If anyone could > try these and report I'd appreciate it. > > Patrick > > > Ted Dunning wrote: > >> Worked pretty well for me. We did extend all of our timeouts. The >> biggest >> worry for us was timeouts on the client side. The ZK server side was no >> problem in that respect. >> >> On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao wrote: >> >> Has anyone deployed ZK on EC2? What's the experience there? Are there >>> more >>> timeouts, lead re-election, etc? Thanks, >>> >>> Jun >>> IBM Almaden Research Center >>> K55/B1, 650 Harry Road, San Jose, CA 95120-6099 >>> >>> jun...@almaden.ibm.com >>> >> >> >> >> >> -- Ted Dunning, CTO DeepDyve
Re: ZK on EC2
I'm really interested to know how ec2 compares wrt disk and network performance to what I've documented here under the "hardware" section: http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview#Hardware Is it possible for someone to compare the network and disk performance (scp, dd, md5sum, etc...) that I document in the wiki page on say, EC2 small/large nodes? I'd do it myself but I've not used ec2. If anyone could try these and report I'd appreciate it. Patrick Ted Dunning wrote: Worked pretty well for me. We did extend all of our timeouts. The biggest worry for us was timeouts on the client side. The ZK server side was no problem in that respect. On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao wrote: Has anyone deployed ZK on EC2? What's the experience there? Are there more timeouts, lead re-election, etc? Thanks, Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com
Re: ZK on EC2
10-30s at different times. Not sure what the final numbers were. On Mon, Nov 9, 2009 at 4:39 PM, Jun Rao wrote: > Thanks, Ted. > > What long did you set the client timeout? > > Jun > IBM Almaden Research Center > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > jun...@almaden.ibm.com > > > Ted Dunning wrote on 11/09/2009 04:24:16 PM: > > > [image removed] > > > > Re: ZK on EC2 > > > > Ted Dunning > > > > to: > > > > zookeeper-user > > > > 11/09/2009 04:25 PM > > > > Please respond to zookeeper-user > > > > Worked pretty well for me. We did extend all of our timeouts. The > biggest > > worry for us was timeouts on the client side. The ZK server side was no > > problem in that respect. > > > > On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao wrote: > > > > > Has anyone deployed ZK on EC2? What's the experience there? Are there > more > > > timeouts, lead re-election, etc? Thanks, > > > > > > Jun > > > IBM Almaden Research Center > > > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > > > > > jun...@almaden.ibm.com > > > > > > > > > > -- > > Ted Dunning, CTO > > DeepDyve > -- Ted Dunning, CTO DeepDyve
Re: ZK on EC2
Thanks, Ted. What long did you set the client timeout? Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 jun...@almaden.ibm.com Ted Dunning wrote on 11/09/2009 04:24:16 PM: > [image removed] > > Re: ZK on EC2 > > Ted Dunning > > to: > > zookeeper-user > > 11/09/2009 04:25 PM > > Please respond to zookeeper-user > > Worked pretty well for me. We did extend all of our timeouts. The biggest > worry for us was timeouts on the client side. The ZK server side was no > problem in that respect. > > On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao wrote: > > > Has anyone deployed ZK on EC2? What's the experience there? Are there more > > timeouts, lead re-election, etc? Thanks, > > > > Jun > > IBM Almaden Research Center > > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > > > jun...@almaden.ibm.com > > > > > -- > Ted Dunning, CTO > DeepDyve
Re: ZK on EC2
Worked pretty well for me. We did extend all of our timeouts. The biggest worry for us was timeouts on the client side. The ZK server side was no problem in that respect. On Mon, Nov 9, 2009 at 4:20 PM, Jun Rao wrote: > Has anyone deployed ZK on EC2? What's the experience there? Are there more > timeouts, lead re-election, etc? Thanks, > > Jun > IBM Almaden Research Center > K55/B1, 650 Harry Road, San Jose, CA 95120-6099 > > jun...@almaden.ibm.com -- Ted Dunning, CTO DeepDyve