[ 
https://issues.apache.org/jira/browse/YARN-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571719#comment-16571719
 ] 

Sunil Govindan edited comment on YARN-8561 at 8/7/18 3:00 PM:
--------------------------------------------------------------

Thanks [~leftnoteasy] for the effort. I have tried to look through the approach 
and code. 
Few comments which is mixed or major and minor :)

1. I think we can used same CLI model of client where CLI extends Configured 
and implements Tool. This helps for tests. Also this helps to avoid abstract 
run method as its Tool.
2. We could also stop a job from CLI, correct? In that case, do we need to do 
some thing more extra than a simple yarn app -kill appId ?
3. I think we can use UnitsConversionUtil for unit convertion. 
CliUtils#parseResourcesString
4. In CapSchedConfig for absolute resource, we used a pattern match code.
{code}
public static final String PATTERN_FOR_ABSOLUTE_RESOURCE = "^\\[[\\w\\.,\\-_=\\ 
/]+\\]$";
private static final Pattern RESOURCE_PATTERN = 
Pattern.compile(PATTERN_FOR_ABSOLUTE_RESOURCE);
{code}
Could we use same in CLI as well?
5. May be rename JobState to SubmarineJobState
6. Commandline options looks very clean and thorough. I think as we go forward, 
more CLI options will be added. and it will become more complex. Could we load 
a profile to submarine and then use the profile get 80% of such config items. 
Given a profile, may be user might need to fill 1 or 2 variable arguments.
7. DevelopperGuide.md ==> DeveloperGuide.md
8. In getServiceResourceFromYarnResource, I think we should get the resource 
list from ResourceUtils. Also it might be better to use a common client/server 
util method to create resource. something like 
Resource.newInstance(yarnResource) or Resources.createResource(yarnResource)
9. In verbose or debug mode, may be in YarnServiceJobSubmitter could dump all 
contents of \{{FileWriter fw}}
10. It might be better to add a shutdown signal or interrupt signal to break 
out from JobMonitor#waitTrainingFinal, if job is faulty.
11. In fromServiceState, service state STOPPED is considered as 
JobState.SUCCEEDED;
12. Some commented code in JobStatusBuilder
13. How could we increase number of workers on a running job?


was (Author: sunilg):
Thanks [~leftnoteasy] for the effort. I have tried to look through the approach 
and code. 
Few comments which is mixed or major and minor :)

1. I think we can used same CLI model of client where CLI extends Configured 
and implements Tool. This helps for tests. Also this helps to avoid abstract 
run method as its Tool.
2. We could also stop a job from CLI, correct? In that case, do we need to do 
some thing more extra than a simple yarn app -kill appId ?
3. I think we can use UnitsConversionUtil for unit convertion. 
CliUtils#parseResourcesString
4. In CapSchedConfig for absolute resource, we used a pattern match code.
{code}
public static final String PATTERN_FOR_ABSOLUTE_RESOURCE = "^\\[[\\w\\.,\\-_=\\ 
/]+\\]$";
private static final Pattern RESOURCE_PATTERN = 
Pattern.compile(PATTERN_FOR_ABSOLUTE_RESOURCE);
{code}
Could we use same in CLI as well?
5. May be rename JobState to SubmarineJobState
6. Commandline options looks very clean and thorough. I think as we go forward, 
more CLI options will be added. and it will become more complex. Could we load 
a profile to submarine and then use the profile get 80% of such config items. 
Given a profile, may be user might need to fill 1 or 2 variable arguments.
7. DevelopperGuide.md ==> DeveloperGuide.md

> [Submarine] Add initial implementation: training job submission and job 
> history retrieve.
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-8561
>                 URL: https://issues.apache.org/jira/browse/YARN-8561
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Major
>         Attachments: YARN-8561.001.patch
>
>
> Added following parts:
> 1) New subcomponent of YARN, under applications/ project. 
> 2) Tensorflow training job submission, including training (single node and 
> distributed). 
> - Supported Docker container. 
> - Support GPU isolation. 
> - Support YARN registry DNS.
> 3) Retrieve job history.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to