There is a way to confirm if it is the same bug. Can you pick a jstack on the process that has established a connection to 50010 and post it here..
Thanks hemanth On Thu, Mar 21, 2013 at 12:13 PM, Krishna Kishore Bonagiri < write2kish...@gmail.com> wrote: > Hi Hemanth & Sandy, > > Thanks for your reply. Yes, that indicates it is in close wait state, > exactly like below: > > java 30718 dsadm 200u IPv4 1178376459 0t0 > TCP *:50010 (LISTEN) > java 31512 dsadm 240u IPv6 1178391921 0t0 > TCP node1:51342->node1:50010 (CLOSE_WAIT) > > I just checked in at the link > https://issues.apache.org/jira/browse/HDFS-3357 it shows 2.0.0-alpha both > in affect versions and fix versions. > > There is another bug 3591, at > https://issues.apache.org/jira/browse/HDFS-3591 > > which says it is for backporting 3357 to branch 0.23 > > So, I don't understand whether the fix is really in 2.0.0-alpha, request > you to please clarify me. > > Thanks, > Kishore > > > > > > On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> There was an issue related to hung connections (HDFS-3357). But the JIRA >> indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth >> checking on Sandy's suggestion >> >> >> On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza <sandy.r...@cloudera.com>wrote: >> >>> Hi Kishore, >>> >>> 50010 is the datanode port. Does your lsof indicate that the sockets are >>> in CLOSE_WAIT? I had come across an issue like this where that was a >>> symptom. >>> >>> -Sandy >>> >>> >>> On Wed, Mar 20, 2013 at 4:24 AM, Krishna Kishore Bonagiri < >>> write2kish...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I am running a date command with YARN's distributed shell example in a >>>> loop of 1000 times in this way: >>>> >>>> yarn jar >>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar >>>> org.apache.hadoop.yarn.applications.distributedshell.Client --jar >>>> /home/kbonagir/yarn/hadoop-2.0.0-alpha/share/hadoop/mapreduce/hadoop-yarn-applications-distributedshell-2.0.0-alpha.jar >>>> --shell_command date --num_containers 2 >>>> >>>> >>>> Around 730th time or so, I am getting an error in node manager's log >>>> saying that it failed to launch container because there are "Too many open >>>> files" and when I observe through lsof command,I find that there is one >>>> instance of this kind of file is left for each run of Application Master, >>>> and it kept growing as I am running it in loop. >>>> >>>> node1:44871->node1:50010 >>>> >>>> Is this a known issue? Or am I missing doing something? Please help. >>>> >>>> Note: I am working on hadoop--2.0.0-alpha >>>> >>>> Thanks, >>>> Kishore >>>> >>> >>> >> >