[ https://issues.apache.org/jira/browse/HDFS-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
HaiBin Huang reassigned HDFS-14783: ----------------------------------- Assignee: HaiBin Huang > expired SlowPeersReport will keep staying on namenode's jmx > ----------------------------------------------------------- > > Key: HDFS-14783 > URL: https://issues.apache.org/jira/browse/HDFS-14783 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: HaiBin Huang > Assignee: HaiBin Huang > Priority: Major > Attachments: HDFS-14783 > > > SlowPeersReport in namenode's jmx can tell us which datanode is slow node, > and it is calculated by the average duration between two datanode sending > packet. Here is an example, if dn1 send packet to dn2 tasks too long in > average (over the *upperLimitLatency*), you will see SlowPeersReport in > namenode's jmx like this : > {code:java} > "SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}] > {code} > However, if dn1 just sending some packet to dn2 with a slow speed in the > beginning , then didn't send any packet to dn2 for a long time, which will > keep the abovementioned SlowPeersReport staying on namenode's jmx . I think > this SlowPeersReport might be an expired message, because the network between > dn1 and dn2 may have returned to normal, but the SlowPeersReport is still on > nameonode's jmx until next time dn1 sending packet to dn2. So I use a > timestamp to record when an *org.apache.hadoop.metrics2.util.SampleStat* is > created, and calculate the average duration with the valid *SampleStat ,* > which is judged by it timestamp. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org