Actually, there is nothing in %wa but a ton sitting in %id. This is
from the Master:
top - 18:30:24 up 5 days, 20:10, 1 user, load average: 2.55, 1.99, 1.25
Tasks: 89 total, 1 running, 88 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st
Mem: 17920228k total, 2795464k used, 15124764k free, 248428k buffers
Swap: 0k total, 0k used, 0k free, 1398388k cached
I have atop installed which is reporting the hadoop/hbase java daemons
as the most active processes (barely taking any CPU time though and
most of the time in sleep mode):
ATOP - domU-12-31-39-18-1 2010/08/17 18:31:46 10 seconds elapsed
PRC | sys 0.01s | user 0.00s | #proc 89 | #zombie 0 | #exit 0 |
CPU | sys 0% | user 0% | irq 0% | idle 200% | wait 0% |
cpu | sys 0% | user 0% | irq 0% | idle 100% | cpu000 w 0% |
CPL | avg1 2.55 | avg5 2.12 | avg15 1.35 | csw 2397 | intr 2034 |
MEM | tot 17.1G | free 14.4G | cache 1.3G | buff 242.6M | slab 193.1M |
SWP | tot 0.0M | free 0.0M | | vmcom 1.6G | vmlim 8.5G |
NET | transport | tcpi 330 | tcpo 169 | udpi 566 | udpo 147 |
NET | network | ipi 896 | ipo 316 | ipfrw 0 | deliv 896 |
NET | eth0 ---- | pcki 777 | pcko 197 | si 248 Kbps | so 70 Kbps |
NET | lo ---- | pcki 119 | pcko 119 | si 9 Kbps | so 9 Kbps |
PID CPU COMMAND-LINE 1/1
17613 0% atop
17150 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemor
16527 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
16839 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
16735 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
17083 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemor
Same with atop:
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
16527 ubuntu 20 0 2352M 98M 10336 S 0.0 0.6 0:42.05
/usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
-Dhadoop.log.dir=/var/log/h
16735 ubuntu 20 0 2403M 81544 10236 S 0.0 0.5 0:01.56
/usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
-Dhadoop.log.dir=/var/log/h
17083 ubuntu 20 0 4557M 45388 10912 S 0.0 0.3 0:00.65
/usr/lib/jvm/java-6-sun/bin/java -Xmx2048m
-XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode -server -XX:+Heap
1 root 20 0 23684 1880 1272 S 0.0 0.0 0:00.23 /sbin/init
587 root 20 0 247M 4088 2432 S 0.0 0.0 -596523h-14:-8
/usr/sbin/console-kit-daemon --no-daemon
3336 root 20 0 49256 1092 540 S 0.0 0.0 0:00.36 /usr/sbin/sshd
16430 nobody 20 0 34408 3704 1060 S 0.0 0.0 0:00.01 gmond
17150 ubuntu 20 0 2519M 112M 11312 S 0.0 0.6 -596523h-14:-8
/usr/lib/jvm/java-6-sun/bin/java -Xmx2048m
-XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode -server -XX
So I'm a bit perplexed. Are there any hadoop / hbase specific tricks
that can reveal what these processes are doing?
-GS
On Tue, Aug 17, 2010 at 6:14 PM, Jean-Daniel Cryans <[email protected]> wrote:
>
> It's not normal, but then again I don't have access to your machines
> so I can only speculate.
>
> Does "top" show you which process is in %wa? If so and it's a java
> process, can you figure what's going on in there?
>
> J-D
>
> On Tue, Aug 17, 2010 at 11:03 AM, George Stathis <[email protected]> wrote:
> > Hello,
> >
> > We have just setup a new cluster on EC2 using Hadoop 0.20.2 and HBase
> > 0.20.3. Our small setup as of right now consists of one master and four
> > slaves with a replication factor of 2:
> >
> > Master: xLarge instance with 2 CPUs and 17.5 GB RAM - runs 1 namenode, 1
> > secondarynamenode, 1 jobtracker, 1 hbasemaster, 1 zookeeper (uses its' own
> > dedicated EMS drive)
> > Slaves: xLarge instance with 2 CPUs and 17.5 GB RAM each - run 1 datanode, 1
> > tasktracker, 1 regionserver
> >
> > We have also installed Ganglia to monitor the cluster stats as we are about
> > to run some performance tests but, right out of the box, we are noticing
> > high system loads (especially on the master node) without any activity
> > happening on the clister. Of course, the CPUs are not being utilized at all,
> > but Ganglia is reporting almost all nodes in the red as the 1, 5 an 15
> > minute load times are all above 100% most of the time (i.e. there are more
> > than two processes at a time competing for the 2 CPUs time).
> >
> > Question1: is this normal?
> > Question2: does it matter since each process barely uses any of the CPU
> > time?
> >
> > Thank you in advance and pardon the noob questions.
> >
> > -GS
> >