what does vmstat say? Run it like 'vmstat 2' for a minute or two and paste the results.
With no cpu being consumed by java, it seems like there must be another hidden variable here. Some zombied process perhaps. Or some kind of super IO wait or something else. Since you are running on a hypervisor environment, i cant really say what is happening to your instance, although one would think the LA numbers would be unaffected by outside processes. On Tue, Aug 17, 2010 at 3:49 PM, George P. Stathis <[email protected]> wrote: > Actually, there is nothing in %wa but a ton sitting in %id. This is > from the Master: > > top - 18:30:24 up 5 days, 20:10, 1 user, load average: 2.55, 1.99, 1.25 > Tasks: 89 total, 1 running, 88 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st > Mem: 17920228k total, 2795464k used, 15124764k free, 248428k buffers > Swap: 0k total, 0k used, 0k free, 1398388k cached > > I have atop installed which is reporting the hadoop/hbase java daemons > as the most active processes (barely taking any CPU time though and > most of the time in sleep mode): > > ATOP - domU-12-31-39-18-1 2010/08/17 18:31:46 10 seconds > elapsed > PRC | sys 0.01s | user 0.00s | #proc 89 | #zombie 0 | #exit 0 > | > CPU | sys 0% | user 0% | irq 0% | idle 200% | wait 0% > | > cpu | sys 0% | user 0% | irq 0% | idle 100% | cpu000 w 0% > | > CPL | avg1 2.55 | avg5 2.12 | avg15 1.35 | csw 2397 | intr 2034 > | > MEM | tot 17.1G | free 14.4G | cache 1.3G | buff 242.6M | slab 193.1M > | > SWP | tot 0.0M | free 0.0M | | vmcom 1.6G | vmlim 8.5G > | > NET | transport | tcpi 330 | tcpo 169 | udpi 566 | udpo 147 > | > NET | network | ipi 896 | ipo 316 | ipfrw 0 | deliv 896 > | > NET | eth0 ---- | pcki 777 | pcko 197 | si 248 Kbps | so 70 Kbps > | > NET | lo ---- | pcki 119 | pcko 119 | si 9 Kbps | so 9 Kbps > | > > PID CPU COMMAND-LINE 1/1 > 17613 0% atop > 17150 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m > -XX:+HeapDumpOnOutOfMemor > 16527 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server > -Dcom.sun.managem > 16839 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server > -Dcom.sun.managem > 16735 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server > -Dcom.sun.managem > 17083 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m > -XX:+HeapDumpOnOutOfMemor > > Same with atop: > > PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command > 16527 ubuntu 20 0 2352M 98M 10336 S 0.0 0.6 0:42.05 > /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server > -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote > -Dhadoop.log.dir=/var/log/h > 16735 ubuntu 20 0 2403M 81544 10236 S 0.0 0.5 0:01.56 > /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server > -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote > -Dhadoop.log.dir=/var/log/h > 17083 ubuntu 20 0 4557M 45388 10912 S 0.0 0.3 0:00.65 > /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC > -XX:+CMSIncrementalMode -server -XX:+Heap > 1 root 20 0 23684 1880 1272 S 0.0 0.0 0:00.23 /sbin/init > 587 root 20 0 247M 4088 2432 S 0.0 0.0 -596523h-14:-8 > /usr/sbin/console-kit-daemon --no-daemon > 3336 root 20 0 49256 1092 540 S 0.0 0.0 0:00.36 /usr/sbin/sshd > 16430 nobody 20 0 34408 3704 1060 S 0.0 0.0 0:00.01 gmond > 17150 ubuntu 20 0 2519M 112M 11312 S 0.0 0.6 -596523h-14:-8 > /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m > -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC > -XX:+CMSIncrementalMode -server -XX > > > So I'm a bit perplexed. Are there any hadoop / hbase specific tricks > that can reveal what these processes are doing? > > -GS > > > > On Tue, Aug 17, 2010 at 6:14 PM, Jean-Daniel Cryans <[email protected]> > wrote: >> >> It's not normal, but then again I don't have access to your machines >> so I can only speculate. >> >> Does "top" show you which process is in %wa? If so and it's a java >> process, can you figure what's going on in there? >> >> J-D >> >> On Tue, Aug 17, 2010 at 11:03 AM, George Stathis <[email protected]> wrote: >> > Hello, >> > >> > We have just setup a new cluster on EC2 using Hadoop 0.20.2 and HBase >> > 0.20.3. Our small setup as of right now consists of one master and four >> > slaves with a replication factor of 2: >> > >> > Master: xLarge instance with 2 CPUs and 17.5 GB RAM - runs 1 namenode, 1 >> > secondarynamenode, 1 jobtracker, 1 hbasemaster, 1 zookeeper (uses its' own >> > dedicated EMS drive) >> > Slaves: xLarge instance with 2 CPUs and 17.5 GB RAM each - run 1 datanode, >> > 1 >> > tasktracker, 1 regionserver >> > >> > We have also installed Ganglia to monitor the cluster stats as we are about >> > to run some performance tests but, right out of the box, we are noticing >> > high system loads (especially on the master node) without any activity >> > happening on the clister. Of course, the CPUs are not being utilized at >> > all, >> > but Ganglia is reporting almost all nodes in the red as the 1, 5 an 15 >> > minute load times are all above 100% most of the time (i.e. there are more >> > than two processes at a time competing for the 2 CPUs time). >> > >> > Question1: is this normal? >> > Question2: does it matter since each process barely uses any of the CPU >> > time? >> > >> > Thank you in advance and pardon the noob questions. >> > >> > -GS >> > >
