[ 
https://issues.apache.org/jira/browse/YARN-8330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553540#comment-16553540
 ] 

Jason Lowe commented on YARN-8330:
----------------------------------

One of the envisioned use-cases of ATSv2 was to record every container 
allocation for an application in order to do analysis like what is done in 
YARN-415.  That requires recording every container that enters the ALLOCATED 
state.

bq. If information is collected for end user consumption to understand their 
application usage, then by collecting RUNNING state might be sufficient.

That does not cover the case where an application receives containers but for 
some reason holds onto the allocations for a while before launching them.  Tez, 
for example, has some corner cases in its scheduler where it can hold onto 
container allocations for prolonged periods before launching them while it 
reuses other, already active containers.  Holding onto unlaunched container 
allocations is part of its footprint on the cluster.  Only showing containers 
that made it to the RUNNING is only telling part of the application's usage 
story.

Fixing the yarn container -list command problem does not mean the fix has to 
solely be in ATSv2.  If the data for a container in ATS is sufficient to 
distinguish allocated containers from running containers then this can, and 
arguably should be, filtered for the yarn container -list use-case.  If the 
data recorded for a container can't distinguish this then we should look into 
fixing that so it can be.  But I don't think we should pre-filter ALLOCATED 
containers on the RM publishing side and preclude the proper app footprint 
analysis use-case entirely.


> An extra container got launched by RM for yarn-service
> ------------------------------------------------------
>
>                 Key: YARN-8330
>                 URL: https://issues.apache.org/jira/browse/YARN-8330
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn-native-services
>            Reporter: Yesha Vora
>            Assignee: Suma Shivaprasad
>            Priority: Critical
>         Attachments: YARN-8330.1.patch, YARN-8330.2.patch, YARN-8330.3.patch
>
>
> Steps:
> launch Hbase tarball app
> list containers for hbase tarball app
> {code}
> /usr/hdp/current/hadoop-yarn-client/bin/yarn container -list 
> appattempt_1525463491331_0006_000001
> WARNING: YARN_LOG_DIR has been replaced by HADOOP_LOG_DIR. Using value of 
> YARN_LOG_DIR.
> WARNING: YARN_LOGFILE has been replaced by HADOOP_LOGFILE. Using value of 
> YARN_LOGFILE.
> WARNING: YARN_PID_DIR has been replaced by HADOOP_PID_DIR. Using value of 
> YARN_PID_DIR.
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 18/05/04 22:36:11 INFO client.AHSProxy: Connecting to Application History 
> server at xxx/xxx:10200
> 18/05/04 22:36:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> Total number of containers :5
> Container-Id            Start Time             Finish Time                   
> State                    Host       Node Http Address                         
>        LOG-URL
> container_e06_1525463491331_0006_01_000002    Fri May 04 22:34:26 +0000 2018  
>                  N/A                 RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000002/hrt_qa
> 2018-05-04 22:36:11,216|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000003
>     Fri May 04 22:34:26 +0000 2018                   N/A                 
> RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000003/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000001
>     Fri May 04 22:34:15 +0000 2018                   N/A                 
> RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000001/hrt_qa
> 2018-05-04 22:36:11,217|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000005
>     Fri May 04 22:34:56 +0000 2018                   N/A                 
> RUNNING    xxx:25454  http://xxx:8042    
> http://xxx:8042/node/containerlogs/container_e06_1525463491331_0006_01_000005/hrt_qa
> 2018-05-04 22:36:11,218|INFO|MainThread|machine.py:167 - 
> run()||GUID=0169fa41-d1c5-4b43-85bf-c3e9f2682398|container_e06_1525463491331_0006_01_000004
>     Fri May 04 22:34:56 +0000 2018                   N/A                    
> null    xxx:25454  http://xxx:8042    
> http://xxx:8188/applicationhistory/logs/xxx:25454/container_e06_1525463491331_0006_01_000004/container_e06_1525463491331_0006_01_000004/hrt_qa{code}
> Total expected containers = 4 ( 3 components container + 1 am). Instead, RM 
> is listing 5 containers. 
> container_e06_1525463491331_0006_01_000004 is in null state.
> Yarn service utilized container 02, 03, 05 for component. There is no log 
> available in NM & AM related to container 04. Only one line in RM log is 
> printed
> {code}
> 2018-05-04 22:34:56,618 INFO  rmcontainer.RMContainerImpl 
> (RMContainerImpl.java:handle(489)) - 
> container_e06_1525463491331_0006_01_000004 Container Transitioned from NEW to 
> RESERVED{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to