Re: High OS Load Numbers when idle

Ryan Rawson Tue, 17 Aug 2010 15:54:31 -0700

what does vmstat say?  Run it like 'vmstat 2' for a minute or two and
paste the results.


With no cpu being consumed by java, it seems like there must be
another hidden variable here.  Some zombied process perhaps. Or some
kind of super IO wait or something else.

Since you are running on a hypervisor environment, i cant really say
what is happening to your instance, although one would think the LA
numbers would be unaffected by outside processes.

On Tue, Aug 17, 2010 at 3:49 PM, George P. Stathis <[email protected]> wrote:
> Actually, there is nothing in %wa but a ton sitting in %id. This is
> from the Master:
>
> top - 18:30:24 up 5 days, 20:10,  1 user,  load average: 2.55, 1.99, 1.25
> Tasks:  89 total,   1 running,  88 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.2%st
> Mem:  17920228k total,  2795464k used, 15124764k free,   248428k buffers
> Swap:        0k total,        0k used,        0k free,  1398388k cached
>
> I have atop installed which is reporting the hadoop/hbase java daemons
> as the most active processes (barely taking any CPU time though and
> most of the time in sleep mode):
>
> ATOP - domU-12-31-39-18-1 2010/08/17  18:31:46               10 seconds 
> elapsed
> PRC | sys   0.01s | user   0.00s | #proc     89 | #zombie    0 | #exit      0 
> |
> CPU | sys      0% | user      0% | irq       0% | idle    200% | wait      0% 
> |
> cpu | sys      0% | user      0% | irq       0% | idle    100% | cpu000 w  0% 
> |
> CPL | avg1   2.55 | avg5    2.12 | avg15   1.35 | csw     2397 | intr    2034 
> |
> MEM | tot   17.1G | free   14.4G | cache   1.3G | buff  242.6M | slab  193.1M 
> |
> SWP | tot    0.0M | free    0.0M |              | vmcom   1.6G | vmlim   8.5G 
> |
> NET | transport   | tcpi     330 | tcpo     169 | udpi     566 | udpo     147 
> |
> NET | network     | ipi      896 | ipo      316 | ipfrw      0 | deliv    896 
> |
> NET | eth0   ---- | pcki     777 | pcko     197 | si  248 Kbps | so   70 Kbps 
> |
> NET | lo     ---- | pcki     119 | pcko     119 | si    9 Kbps | so    9 Kbps 
> |
>
>  PID  CPU COMMAND-LINE                                                  1/1
> 17613   0% atop
> 17150   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m 
> -XX:+HeapDumpOnOutOfMemor
> 16527   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server 
> -Dcom.sun.managem
> 16839   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server 
> -Dcom.sun.managem
> 16735   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server 
> -Dcom.sun.managem
> 17083   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m 
> -XX:+HeapDumpOnOutOfMemor
>
> Same with atop:
>
>  PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
> 16527 ubuntu    20   0 2352M   98M 10336 S  0.0  0.6  0:42.05
> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server
> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
> -Dhadoop.log.dir=/var/log/h
> 16735 ubuntu    20   0 2403M 81544 10236 S  0.0  0.5  0:01.56
> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server
> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
> -Dhadoop.log.dir=/var/log/h
> 17083 ubuntu    20   0 4557M 45388 10912 S  0.0  0.3  0:00.65
> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m
> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode -server -XX:+Heap
>    1 root      20   0 23684  1880  1272 S  0.0  0.0  0:00.23 /sbin/init
>  587 root      20   0  247M  4088  2432 S  0.0  0.0 -596523h-14:-8
> /usr/sbin/console-kit-daemon --no-daemon
>  3336 root      20   0 49256  1092   540 S  0.0  0.0  0:00.36 /usr/sbin/sshd
> 16430 nobody    20   0 34408  3704  1060 S  0.0  0.0  0:00.01 gmond
> 17150 ubuntu    20   0 2519M  112M 11312 S  0.0  0.6 -596523h-14:-8
> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m
> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode -server -XX
>
>
> So I'm a bit perplexed. Are there any hadoop / hbase specific tricks
> that can reveal what these processes are doing?
>
> -GS
>
>
>
> On Tue, Aug 17, 2010 at 6:14 PM, Jean-Daniel Cryans <[email protected]> 
> wrote:
>>
>> It's not normal, but then again I don't have access to your machines
>> so I can only speculate.
>>
>> Does "top" show you which process is in %wa? If so and it's a java
>> process, can you figure what's going on in there?
>>
>> J-D
>>
>> On Tue, Aug 17, 2010 at 11:03 AM, George Stathis <[email protected]> wrote:
>> > Hello,
>> >
>> > We have just setup a new cluster on EC2 using Hadoop 0.20.2 and HBase
>> > 0.20.3. Our small setup as of right now consists of one master and four
>> > slaves with a replication factor of 2:
>> >
>> > Master: xLarge instance with 2 CPUs and 17.5 GB RAM - runs 1 namenode, 1
>> > secondarynamenode, 1 jobtracker, 1 hbasemaster, 1 zookeeper (uses its' own
>> > dedicated EMS drive)
>> > Slaves: xLarge instance with 2 CPUs and 17.5 GB RAM each - run 1 datanode, 
>> > 1
>> > tasktracker, 1 regionserver
>> >
>> > We have also installed Ganglia to monitor the cluster stats as we are about
>> > to run some performance tests but, right out of the box, we are noticing
>> > high system loads (especially on the master node) without any activity
>> > happening on the clister. Of course, the CPUs are not being utilized at 
>> > all,
>> > but Ganglia is reporting almost all nodes in the red as the 1, 5 an 15
>> > minute load times are all above 100% most of the time (i.e. there are more
>> > than two processes at a time competing for the 2 CPUs time).
>> >
>> > Question1: is this normal?
>> > Question2: does it matter since each process barely uses any of the CPU
>> > time?
>> >
>> > Thank you in advance and pardon the noob questions.
>> >
>> > -GS
>> >
>

Re: High OS Load Numbers when idle

Reply via email to