Re: jobtracker page @50030 timeout or take very long time.

2012-11-01 Thread Zizon Qiu
What about the gc logs? On Fri, Nov 2, 2012 at 11:16 AM, Harsh J wrote: > Do you run a TaskTracker on your JobTracker machine? > > On Fri, Nov 2, 2012 at 2:17 AM, Patai Sangbutsarakum > wrote: > > I have a check monitoring the page jobtracker:50030/jobtracker.jsp, > > and the check shows timeou

Re: jobtracker page @50030 timeout or take very long time.

2012-11-01 Thread Harsh J
Do you run a TaskTracker on your JobTracker machine? On Fri, Nov 2, 2012 at 2:17 AM, Patai Sangbutsarakum wrote: > I have a check monitoring the page jobtracker:50030/jobtracker.jsp, > and the check shows timeout (180 sec) pretty often. > Once I jump and browse to the page it actually take me fro

Re: OutputFormat and Reduce Task

2012-11-01 Thread Harsh J
Hi Dhruv, Inline. On Fri, Nov 2, 2012 at 4:15 AM, Dhruv wrote: > I'm trying to optimize the performance of my OutputFormat's implementation. > I'm doing things similar to HBase's TableOutputFormat--sending the reducer's > output to a distributed k-v store. So, the context.write() call basically

Re: SequenceFile syncFs behavior?

2012-11-01 Thread Harsh J
Hi Thanh Do, SequenceFile.Writer.syncFs() in 2.x and 2.x-based releases, is deprecated in favor of hsync() and hflush() new methods but internally calls hflush itself, so its behavior is the same with regards to durability as it was before (new metadata entries are created and the buffer is flushe

Re: Set the number of maps

2012-11-01 Thread Cogan, Peter (Peter)
Thanks for your answers! From: Marcos Ortiz mailto:mlor...@uci.cu>> Date: Thu, 1 Nov 2012 19:03:23 +0100 To: peter cogan mailto:peter.co...@alcatel-lucent.com>> Cc: "user@hadoop.apache.org" mailto:user@hadoop.apache.org>> Subject: Re: Set the number of maps The

Re: jobtracker page @50030 timeout or take very long time.

2012-11-01 Thread Robert Molina
Hi Patai, Have you looked into verifying if it is network related, maybe see what the ping responses are to that node? On Thu, Nov 1, 2012 at 1:47 PM, Patai Sangbutsarakum < silvianhad...@gmail.com> wrote: > I have a check monitoring the page jobtracker:50030/jobtracker.jsp, > and the check show

Re: jobtracker page @50030 timeout or take very long time.

2012-11-01 Thread Bryan Beaudreault
Does this happen all of the time? I've seen cases where if I have a job or process that is hosing the CPU of a tasktracker, it can cause the job tracker to pause while trying to contact that tasktracker. Once the CPU load dips down to acceptable levels communication can flow again and the job tra

RE: Map Reduce slot

2012-11-01 Thread Kartashov, Andy
I checked my MR admin page running on :50030:jobtracker.jsp this is what I am learning. I run 2-core processor, so the admin page told me that my max map/reduce slot capacity was 14 or so I assume 7 nodes x 2slots. I did not touch the property of .map.tasks. It seemed that MR set it nicely t

Re: Insight on why distcp becomes slower when adding nodemanager

2012-11-01 Thread Alexandre Fouche
I was using Yarn and HDFS on EC2 and EBS, with default memory settings. I have just read in the Hadoop guide a list of hadoop benchmark jars. I guess i'll use Whirr to create a canned hadoop cluster on ec2, and run these benchmarks. So i will have a baseline to which i can compare. Then i'll com

RE: cluster set-up / a few quick questions - SOLVED

2012-11-01 Thread Kartashov, Andy
People, While I did not find start-balancer.sh script on my machine I successfully utilized the following command: "$hadoop balancer -threshold 10" and achieved the exact same result. One issue remains. Controlling start/stop daemons of the slaves through the master. Somehow I don't have dfs

SequenceFile syncFs behavior?

2012-11-01 Thread Thanh Do
Hi all, Could somebody clarify me the behavior of SequenceFile.syncFs(). From the source, I saw this function is deprecated, and hsync() or hflush() is recommended? However, it seems like current stable release of HBase (0.94) or the cloudera distribution (4.0) uses syncFs for its HLog file. Woul

Re: Set the number of maps

2012-11-01 Thread Marcos Ortiz
The option since 0.21 was renamed to mapreduce.tasktracker.map.tasks.maximum, and like Harsh said to you, is is a TaskTracker service level option. Another thing is that this option is very united to the mapreduce.child.java.opts, so , make sure to monitor constantly the effect of these change

Re: Set the number of maps

2012-11-01 Thread Bejoy KS
Hi Peter 'mapred.tasktracker.map.tasks.maximum' is not for setting an upper cap on the map tasks spawned by a job. This property is used to set the map slots on each TaskTracker. It is TaskTracker level property and cannot be overriden on a per job basis. To control the number of map tasks for

Re: Set the number of maps

2012-11-01 Thread Harsh J
It can't be set from the code this way - the slot property is applied at the TaskTracker service level (as the name goes). Since you're just testing at the moment, try to set these values, restart TTs, and run your jobs again. You do not need to restart JT at any point for tweaking these values.

Re: Set the number of maps

2012-11-01 Thread Ted Dunning
Is the spelling of the option correct? On Thu, Nov 1, 2012 at 6:43 AM, Cogan, Peter (Peter) < peter.co...@alcatel-lucent.com> wrote: > Hi > > I understand that the maximum number of concurrent map tasks is set > by mapred.tasktracker.map.tasks.maximum - however I wish to run with a > smaller num

Re: About connection to NameNode in NameNode-HA

2012-11-01 Thread Todd Lipcon
Hi Yamashita, Yes, the order in which the nodes are tried is indeed the order in which your nodes are listed in the configuration. We currently do write StandybException into the standby's logs, but I think it's a good idea for an improvement to remove that, since it is an "expected exception" an

About connection to NameNode in NameNode-HA

2012-11-01 Thread Shinichi Yamashita
Hi, I write hdfs-site.xml of all nodes as follows in NameNode-HA. -- ... snip ... dfs.namenode.rpc-address.ns.nn1 nn1:8020 dfs.namenode.rpc-address.ns.nn2 nn2:8020 dfs.client.failover.proxy.provider.nn org.apache.hadoop.hdfs.server.namenode.ha.Configu

Re: Map Reduce slot

2012-11-01 Thread Michael Segel
"However in production clustes the jvm size is marked final to prevent abuses that may lead to OOMs." Not necessarily. On Nov 1, 2012, at 6:43 AM, Bejoy Ks wrote: > However in production clustes the jvm size is marked final to prevent abuses > that may lead to OOMs.

Re: Low shuffle transfer speeds

2012-11-01 Thread Harsh J
Hi, The reducer copies map outputs progressively (as and when they complete) unless configured otherwise. It is normal hence, for the overall average (thats what it is currently, unfortunately), to show up lower than the actual value since there are periods where the reducer is idle in waiting for

Low shuffle transfer speeds

2012-11-01 Thread john smith
Hi list, I have jobs that generate huge amount of intermediate data. For eg: One of my job generates almost 12 GB map output. I have 8 datanodes/TTs and 1 master. My reduce progress shows that the copy speed in range 0.55 - 1 MBps , but normal file transfers between my datanodes generally go up t

Re: Map Reduce slot

2012-11-01 Thread kapil bhosale
Hi Uddipan, In my Opinion Hadoop Jobtracker Interface provides every Details Including # number of Map slots # number of Reducer slots and other Infromation can also be found in Log files generated. Which can also be explored through the same Interface. Jobtracker address : Masternode:y

Map Reduce slot

2012-11-01 Thread Uddipan Mukherjee
Hi Hadoop Gurus. What is the good way to know following information in my Hadoop cluster. # number of Map slots # number of Reducer slots JVM Heapsize for Map task JVM heapsize for reduce task reuse JVM flag Thanks and Regards Uddipan CAUTION - Disclaimer * Th