Re: Strange machine behavior

2012-12-18 Thread Bharath Mundlapudi
t;To: user@hadoop.apache.org >Sent: Monday, December 10, 2012 11:23 AM >Subject: Re: Strange machine behavior > > >What kernel did you see this on? Was there significant swap traffic >(si/so in vmstat output) during the high-system-time period? > >BTW, you don't need to n

Re: Strange machine behavior

2012-12-10 Thread Robert Dyer
On Sun, Dec 9, 2012 at 5:45 AM, a...@hsk.hk wrote: > Hi, > > I always set "vm.swappiness = 0" for my hadoop servers (PostgreSQL > servers too). > I have just done this for that machine. So far, I have not seen a re-occurrence of the strange behavior; it appears this might have solved the probl

Re: Strange machine behavior

2012-12-10 Thread Robert Dyer
p.apache.org > *Sent:* Monday, December 10, 2012 11:23 AM > *Subject:* Re: Strange machine behavior > > What kernel did you see this on? Was there significant swap traffic > (si/so in vmstat output) during the high-system-time period? > > BTW, you don't need to nor do you want

Re: Strange machine behavior

2012-12-10 Thread Robert Dyer
On Mon, Dec 10, 2012 at 1:23 PM, Andy Isaacson wrote: > What kernel did you see this on? Was there significant swap traffic > (si/so in vmstat output) during the high-system-time period? > It's an older kernel, Fedora 15. Linux X 2.6.43.8-1.fc15.x86_64 #1 SMP Mon Jun 4 20:33:44 UTC 2012 x86

Re: Strange machine behavior

2012-12-10 Thread Bharath Mundlapudi
Are you seeing any performance impact with this cache increase? It is normal in linux system to grab high cache level. -Bharath From: Andy Isaacson To: user@hadoop.apache.org Sent: Monday, December 10, 2012 11:23 AM Subject: Re: Strange machine behavior

Re: Strange machine behavior

2012-12-10 Thread Andy Isaacson
What kernel did you see this on? Was there significant swap traffic (si/so in vmstat output) during the high-system-time period? BTW, you don't need to nor do you want to run sync(1) when manipulating drop_caches, it just causes additional noise and slowdown. drop_caches doesn't have any impact on

Re: Strange machine behavior

2012-12-09 Thread a...@hsk.hk
Hi, I always set "vm.swappiness = 0" for my hadoop servers (PostgreSQL servers too). The reason is that Linux moves memory pages to swap space if they have not been accessed for a period of time (swapping). Java virtual machine (JVM) does not act well in the case of swapping that will make

Re: Strange machine behavior

2012-12-08 Thread seth
Oracle frequently recommends vm.swappiness = 0 to get well behaved RAC nodes. Otherwise you start paging out things you don't usually want paged out in favor of a larger filesystem cache. There is also a vm parameter that controls the minimum size of the free chain, might want to increase that

Re: Strange machine behavior

2012-12-08 Thread Robert Dyer
Yes but even with a MR running, it is only 36GB heap total out of 64GB ram. This leaves plenty for OS and caching. The problem seems to be the OS preferring to cache over giving space to the applications. Once I drop the caches and rerun the MR job again several times, it runs perfectly fine. On

Re: Strange machine behavior

2012-12-08 Thread Marcos Ortiz
Are you sure that 24 map slots is a good number for this machine? Remember that you have three services (DN, TT and HRegionServer) with with a 12 GB for Heap. Try to use a lower number of map slots (12 for example) and launch your MR job again. Can you share your logs in pastebin? On Sat 08 Dec

Strange machine behavior

2012-12-08 Thread Robert Dyer
Has anyone experienced a TaskTracker/DataNode behaving like the attached image? This was during a MR job (which runs often). Note the extremely high System CPU time. Upon investigating I saw that out of 64GB ram the system had allocated almost 45GB to cache! I did a sudo sh -c "sync ; echo 3 >