[ 
https://issues.apache.org/jira/browse/YARN-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701600#comment-16701600
 ] 

Tao Yang commented on YARN-9050:
--------------------------------

cc: [~cheersyang], [~leftnoteasy], [~sunil.g].  I would be interested in your 
thoughts on this issue. Thanks.

> Usability improvements for scheduler activities
> -----------------------------------------------
>
>                 Key: YARN-9050
>                 URL: https://issues.apache.org/jira/browse/YARN-9050
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: image-2018-11-23-16-46-38-138.png
>
>
> We have did some usability improvements for scheduler activities based on 
> YARN3.1 in our cluster as follows:
> 1. Not available for multi-thread asynchronous scheduling. App and node 
> activites maybe confused when multiple scheduling threads record activites of 
> different allocation processes in the same variables like appsAllocation and 
> recordingNodesAllocation in ActivitiesManager. I think these variables should 
> be thread-local to make activities clear between multiple threads.
> 2. Incomplete activites for multi-node lookup machanism, since 
> ActivitiesLogger will skip recording through {{if (node == null || 
> activitiesManager == null) return; }} when node is null which represents this 
> allocation is for multi-nodes. We need support recording activities for 
> multi-node lookup machanism.
> 3. Current app activites can not meet requirements of diagnostics, for 
> example, we can know that node doesn't match request but hard to know why, 
> especially when using placement constraints, it's difficult to make a 
> detailed diagnosis manually. So I propose to improve the diagnoses of 
> activites, add diagnosis for placement constraints check, update insufficient 
> resource diagnosis with detailed info (like 'insufficient resource 
> names:[memory-mb]') and so on.
> 4. Add more useful fields for app activities, in some scenarios we need to 
> distinguish different requests but can't locate requests based on app 
> activities info, there are some other fields can help to filter what we want 
> such as allocation tags. We have added containerPriority, allocationRequestId 
> and allocationTags fields in AppAllocation.
> 5. Filter app activities by key fields, sometimes the results of app 
> activities is massive, it's hard to find what we want. We have support filter 
> by allocation-tags to meet requirements from some apps, more over, we can 
> take container-priority and allocation-request-id as candidates if necessary.
> 6. Aggragate app activities by diagnoses. For a single allocation process, 
> activities still can be massive in a large cluster, we frequently want to 
> know why request can't be allocated in cluster, it's hard to check every node 
> manually in a large cluster, so that aggragation for app activities by 
> diagnoses is neccessary to solve this trouble. We have added groupingType 
> parameter for app-activities REST API for this, supports grouping by 
> diagnositics and example like this:
>  !image-2018-11-23-16-46-38-138.png! 
> I think we can have a discuss about these points, useful improvements which 
> can be accepted will be added into the patch. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to