Re: Strange machine behavior

Bharath Mundlapudi Tue, 18 Dec 2012 00:42:55 -0800

You may want to check why System time is high. Check your system call stats. 
This should give you some clue.


-Bharath




________________________________
 From: Robert Dyer <rd...@iastate.edu>
To: user@hadoop.apache.org; Bharath Mundlapudi <bharathw...@yahoo.com> 
Sent: Monday, December 10, 2012 7:32 PM
Subject: Re: Strange machine behavior
 

Yes there is performance impact.  It should be visible from the graph I 
attached.  Basically, the CPU is spending much more time on System and the User 
time is lowered.

When this happens (if I don't do a drop_caches in time) the MR job winds up 
taking significantly longer than usual.



On Mon, Dec 10, 2012 at 8:06 PM, Bharath Mundlapudi <bharathw...@yahoo.com> 
wrote:

Are you seeing any performance impact with this cache increase? It is normal in 
linux system to grab high cache level. 
>
>
>
>-Bharath
>
>
>
>________________________________
> From: Andy Isaacson <a...@cloudera.com>
>To: user@hadoop.apache.org 
>Sent: Monday, December 10, 2012 11:23 AM
>Subject: Re: Strange machine behavior
> 
>
>What kernel did you see this on? Was there significant swap traffic
>(si/so in vmstat output) during the high-system-time period?
>
>BTW, you don't need to nor do you want to run sync(1) when
>manipulating drop_caches, it just causes additional noise and
>slowdown. drop_caches doesn't have any impact on correctness; it won't
>cause data loss (by dropping a dirty page or whatever). I've had sync
>calls take 10 minutes to complete, so the unnecessary impact can be
>significant.
>
>-andy
>
>On Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer <rd...@iastate.edu> wrote:
>> Has anyone experienced a TaskTracker/DataNode behaving like the attached
>> image?
>>
>> This was during a MR job (which runs often).  Note the extremely high System
>> CPU time.  Upon investigating I saw that out of 64GB ram the system had
>> allocated
 almost 45GB to cache!
>>
>> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is
>> roughly where the graph goes back to normal (much lower System, much higher
>> User).
>>
>> This has happened a few times.
>>
>> I have tried playing with the sysctl vm.swappiness value (default of 60) by
>> setting it to 30 (which it was at when the graph was collected) and now to
>> 10.  I am not sure that helps.
>>
>> Any ideas?  Anyone else run into this before?
>>
>> 24 cores
>> 64GB ram
>> 4x2TB sata3 hdd
>>
>> Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on
>> this machine.
>>
>> 24 map slots (1gb heap each), no reducers.
>>
>> Also running HBase 0.94.2 with a RS (8gb ram) on this machine.
>
>
>


-- 

Robert Dyer
rd...@iastate.edu

Re: Strange machine behavior

Reply via email to