Re: Strange machine behavior

Andy Isaacson Mon, 10 Dec 2012 11:24:23 -0800

What kernel did you see this on? Was there significant swap traffic
(si/so in vmstat output) during the high-system-time period?


BTW, you don't need to nor do you want to run sync(1) when
manipulating drop_caches, it just causes additional noise and
slowdown. drop_caches doesn't have any impact on correctness; it won't
cause data loss (by dropping a dirty page or whatever). I've had sync
calls take 10 minutes to complete, so the unnecessary impact can be
significant.

-andy

On Sat, Dec 8, 2012 at 4:09 PM, Robert Dyer <rd...@iastate.edu> wrote:
> Has anyone experienced a TaskTracker/DataNode behaving like the attached
> image?
>
> This was during a MR job (which runs often).  Note the extremely high System
> CPU time.  Upon investigating I saw that out of 64GB ram the system had
> allocated almost 45GB to cache!
>
> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is
> roughly where the graph goes back to normal (much lower System, much higher
> User).
>
> This has happened a few times.
>
> I have tried playing with the sysctl vm.swappiness value (default of 60) by
> setting it to 30 (which it was at when the graph was collected) and now to
> 10.  I am not sure that helps.
>
> Any ideas?  Anyone else run into this before?
>
> 24 cores
> 64GB ram
> 4x2TB sata3 hdd
>
> Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on
> this machine.
>
> 24 map slots (1gb heap each), no reducers.
>
> Also running HBase 0.94.2 with a RS (8gb ram) on this machine.

Re: Strange machine behavior

Reply via email to