[ https://issues.apache.org/jira/browse/YARN-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated YARN-11656: ---------------------------------- Labels: pull-request-available (was: ) > RMStateStore event queue blocked > -------------------------------- > > Key: YARN-11656 > URL: https://issues.apache.org/jira/browse/YARN-11656 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn > Affects Versions: 3.4.1 > Reporter: Bence Kosztolnik > Assignee: Bence Kosztolnik > Priority: Major > Labels: pull-request-available > Attachments: issue.png, log.png > > > h2. Problem statement > > I observed Yarn cluster has pending and available resources as well, but the > cluster utilization is usually around ~50%. The cluster had loaded with 200 > parallel PI example job (from hadoop-mapreduce-examples) with 20 map and 20 > reduce containers configured, on a 50 nodes cluster, where each node had 8 > cores, and a lot of memory (there was cpu bottleneck). > Finally, I realized the RM had some IO bottleneck and needed 1~20 seconds to > persist a RMStateStoreEvent (using FileSystemRMStateStore). > To reduce the impact of the issue: > - create a dispatcher where events can persist in parallel threads > - create metric data for the RMStateStore event queue to be able easily to > identify the problem if occurs on a cluster > {panel:title=Issue visible on UI2} > !issue.png|height=250! > {panel} > Also another way to identify the issue if we can see too much time is > required to store info for app after reach new_saving state > {panel:title=How issue can look like in log} > !log.png|height=250! > {panel} > h2. Solution > Created a *MultiDispatcher* class which implements the Dispatcher interface. > The Dispatcher creates a separate metric object called _Event metrics for > "rm-state-store"_ where we can see > - how many unhandled events are currently present in the event queue for the > specific event type > - how many events were handled for the specific event type > - average execution time for the specific event > The dispatcher has the following configs ( the placeholder is for the > dispatcher name, for example, rm-state-store ) > ||Config name||Description||Default value|| > |yarn.dispatcher.multi-thread.{}.*default-pool-size*|How many parallel > threads should execute the parallel event execution| 4| > |yarn.dispatcher.multi-thread.{}.*max-pool-size*|If the event queue is full > the execution threads will scale up to this many|8| > |yarn.dispatcher.multi-thread.{}.*keep-alive-seconds*|Execution threads will > be destroyed after this many seconds|10| > |yarn.dispatcher.multi-thread.{}.*queue-size*|Size of the eventqueue|1 000 > 000| > |yarn.dispatcher.multi-thread.{}.*monitor-seconds*|The size of the event > queue will be logged with this frequency (if not zero) |30| > |yarn.dispatcher.multi-thread.{}.*graceful-stop-seconds*|After the stop > signal the dispatcher will wait this many seconds to be able to process the > incoming events before terminating them|60| > {panel:title=Example output from RM JMX api} > {noformat} > ... > { > "name": "Hadoop:service=ResourceManager,name=Event metrics for > rm-state-store", > "modelerType": "Event metrics for rm-state-store", > "tag.Context": "yarn", > "tag.Hostname": CENSORED > "RMStateStoreEventType#STORE_APP_ATTEMPT_Current": 51, > "RMStateStoreEventType#STORE_APP_ATTEMPT_NumOps": 0, > "RMStateStoreEventType#STORE_APP_ATTEMPT_AvgTime": 0.0, > "RMStateStoreEventType#STORE_APP_Current": 124, > "RMStateStoreEventType#STORE_APP_NumOps": 46, > "RMStateStoreEventType#STORE_APP_AvgTime": 3318.25, > "RMStateStoreEventType#UPDATE_APP_Current": 31, > "RMStateStoreEventType#UPDATE_APP_NumOps": 16, > "RMStateStoreEventType#UPDATE_APP_AvgTime": 2629.6666666666665, > "RMStateStoreEventType#UPDATE_APP_ATTEMPT_Current": 31, > "RMStateStoreEventType#UPDATE_APP_ATTEMPT_NumOps": 12, > "RMStateStoreEventType#UPDATE_APP_ATTEMPT_AvgTime": 2048.6666666666665, > "RMStateStoreEventType#REMOVE_APP_Current": 12, > "RMStateStoreEventType#REMOVE_APP_NumOps": 3, > "RMStateStoreEventType#REMOVE_APP_AvgTime": 1378.0, > "RMStateStoreEventType#REMOVE_APP_ATTEMPT_Current": 0, > "RMStateStoreEventType#REMOVE_APP_ATTEMPT_NumOps": 0, > "RMStateStoreEventType#REMOVE_APP_ATTEMPT_AvgTime": 0.0, > "RMStateStoreEventType#FENCED_Current": 0, > "RMStateStoreEventType#FENCED_NumOps": 0, > "RMStateStoreEventType#FENCED_AvgTime": 0.0, > "RMStateStoreEventType#STORE_MASTERKEY_Current": 0, > "RMStateStoreEventType#STORE_MASTERKEY_NumOps": 0, > "RMStateStoreEventType#STORE_MASTERKEY_AvgTime": 0.0, > "RMStateStoreEventType#REMOVE_MASTERKEY_Current": 0, > "RMStateStoreEventType#REMOVE_MASTERKEY_NumOps": 0, > "RMStateStoreEventType#REMOVE_MASTERKEY_AvgTime": 0.0, > "RMStateStoreEventType#STORE_DELEGATION_TOKEN_Current": 0, > "RMStateStoreEventType#STORE_DELEGATION_TOKEN_NumOps": 0, > "RMStateStoreEventType#STORE_DELEGATION_TOKEN_AvgTime": 0.0, > "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_Current": 0, > "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_NumOps": 0, > "RMStateStoreEventType#REMOVE_DELEGATION_TOKEN_AvgTime": 0.0, > "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_Current": 0, > "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_NumOps": 0, > "RMStateStoreEventType#UPDATE_DELEGATION_TOKEN_AvgTime": 0.0, > "RMStateStoreEventType#UPDATE_AMRM_TOKEN_Current": 0, > "RMStateStoreEventType#UPDATE_AMRM_TOKEN_NumOps": 0, > "RMStateStoreEventType#UPDATE_AMRM_TOKEN_AvgTime": 0.0, > "RMStateStoreEventType#STORE_RESERVATION_Current": 0, > "RMStateStoreEventType#STORE_RESERVATION_NumOps": 0, > "RMStateStoreEventType#STORE_RESERVATION_AvgTime": 0.0, > "RMStateStoreEventType#REMOVE_RESERVATION_Current": 0, > "RMStateStoreEventType#REMOVE_RESERVATION_NumOps": 0, > "RMStateStoreEventType#REMOVE_RESERVATION_AvgTime": 0.0, > "RMStateStoreEventType#STORE_PROXY_CA_CERT_Current": 0, > "RMStateStoreEventType#STORE_PROXY_CA_CERT_NumOps": 0, > "RMStateStoreEventType#STORE_PROXY_CA_CERT_AvgTime": 0.0 > }, > ... > {noformat} > {panel} > h2. Testing > I deployed the MultiDispatcher supported version of yarn to the cluster and > applied the following performance test: > {code:bash} > #!/bin/bash > for i in {1..50}; > do > ssh root@$i-node-url 'nohup ./perf.sh 4 1>/dev/null 2>/dev/nul &' & > done > sleep 300 > for i in {1..50}; > do > ssh root@$i-node-url "pkill -9 -f perf" & > done > sleep 5 > echo "DONE" > {code} > Each node had do following perf script > {code:bash} > #!/bin/bash > while true > do > if [ $(ps -o pid= -u hadoop | wc -l) -le $1 ] > then > hadoop jar /opt/hadoop-mapreduce-examples.jar pi 20 20 1>/dev/null > 2>&1 & > fi > sleep 1 > done > {code} > This way in 5 minute (+ wait until all job finish) i could process 332 app. > After i tested the same with the official build i needed 5 minute only to > finish with the first app, after that 221 app were finished. > I also tested it with LeveldbRMStateStore and ZKRMStateStore and did not > found any problem with the implementation -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org