[ https://issues.apache.org/jira/browse/YARN-8876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xun Liu updated YARN-8876: -------------------------- Summary: [Submarine] Job monitor long-running service of submarine (was: [Submarine] Job monitor service of submarine) > [Submarine] Job monitor long-running service of submarine > --------------------------------------------------------- > > Key: YARN-8876 > URL: https://issues.apache.org/jira/browse/YARN-8876 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Xun Liu > Assignee: Xun Liu > Priority: Major > > h1. Job monitor service of submarine > After training, the monitoring program need auto close PS service. It is > possible that other deep learning frameworks also have some custom processing > when the tasks are in different states. > The submarine needs to provide a long-term resident service that monitors > each JOB mission. > This monitoring service can be processed differently according to the > training tasks of different depth learning framework types. > For example: Tensorflow performs distributed training, when the training is > completed, > The PS service cannot be automatically stopped. At this time, the PS needs to > be actively stopped by the monitoring service. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org