[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332488#comment-17332488
 ] 

Hadoop QA commented on YARN-10739:
----------------------------------

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
12s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 4s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  8s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m  
8s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
11s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 52s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m  
9s{color} | {color:green}{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} || ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
38s{color} | {color:green}{color} | {color:green} hadoop-yarn-common in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green}{color} | {color:green} The patch does not generate 
ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 49s{color} | 
{color:black}{color} | {color:black}{color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/934/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10739 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13024600/YARN-10739.006.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle spotbugs |
| uname | Linux b02927276a1e 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 9166bfeb74d |
| Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
| Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
|  Test Results | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/934/testReport/ |
| Max. process+thread count | 516 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/934/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> -------------------------------------------------------------------------------
>
>                 Key: YARN-10739
>                 URL: https://issues.apache.org/jira/browse/YARN-10739
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 3.4.0, 3.3.1, 3.2.3
>            Reporter: Zhanqi Cai
>            Assignee: Qi Zhu
>            Priority: Critical
>         Attachments: YARN-10739-001.patch, YARN-10739-002.patch, 
> YARN-10739.003.patch, YARN-10739.003.patch, YARN-10739.004.patch, 
> YARN-10739.005.patch, YARN-10739.006.patch
>
>
> Due to YARN-8995 YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take a long time to 
> process.
> For example:
>  If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> node manager will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> {code:java}
> for(RMApp app : rmContext.getRMApps().values()) {
>   if (!app.isAppFinalStateStored()) {
>     this.rmContext
>         .getDispatcher()
>         .getEventHandler()
>         .handle(
>             new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
>                 appNodeUpdateType));
>   }
> }{code}
> So the total event is 4k*4k=16 mil, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reaches 1 mil+, the 
> Iterator of the queue from printEventQueueDetails will be so slow refer to 
> below: 
> {code:java}
> private void printEventQueueDetails() {
>   Iterator<Event> iterator = eventQueue.iterator();
>   Map<Enum, Long> counterMap = new HashMap<>();
>   while (iterator.hasNext()) {
>     Enum eventType = iterator.next().getType();
> {code}
> Then RM recovery will cost too much time.....
>  Refer to our log:
> {code:java}
> 2021-04-14 20:35:34,432 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(306)) - Size of event-queue is 12000000
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: KILL, Event 
> record counter: 310836
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: NODE_UPDATE, 
> Event record counter: 1103
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: 
> NODE_REMOVED, Event record counter: 1
> 2021-04-14 20:35:35,818 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:printEventQueueDetails(291)) - Event type: APP_REMOVED, 
> Event record counter: 1
> {code}
> Between AsyncDispatcher.handle and printEventQueueDetails, here is more than 
> 1s to do Iterator.
> I upload a file to ensure the printEventQueueDetails only be called one-time 
> pre-30s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to