[jira] [Updated] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues

2017-01-05 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3850:
-
Fix Version/s: 2.8.0

> NM fails to read files from full disks which can lead to container logs being 
> lost and other issues
> ---
>
> Key: YARN-3850
> URL: https://issues.apache.org/jira/browse/YARN-3850
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.1, 3.0.0-alpha1
>
> Attachments: YARN-3850.01.patch, YARN-3850.02.patch
>
>
> *Container logs* can be lost if disk has become full(~90% full).
> When application finishes, we upload logs after aggregation by calling 
> {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns 
> checks the eligible directories on call to 
> {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would 
> return nothing. So none of the container logs are aggregated and uploaded.
> But on application finish, we also call 
> {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
> application directory which contains container logs. This is because it calls 
> {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
> as well.
> So we are left with neither aggregated logs for the app nor the individual 
> container logs for the app.
> In addition to this, there are 2 more issues :
> # {{ContainerLogsUtil#getContainerLogDirs}} does not consider full disks so 
> NM will fail to serve up logs from full disks from its web interfaces.
> # {{RecoveredContainerLaunch#locatePidFile}} also does not consider full 
> disks so it is possible that on container recovery, PID file is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues

2015-09-03 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3850:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1 after fixing a minor conflict in TestLogAggregation.java.

Ran compilation and TestLogAggregationService, TestContainerLogsPage before the 
push. Patch applied cleanly.

> NM fails to read files from full disks which can lead to container logs being 
> lost and other issues
> ---
>
> Key: YARN-3850
> URL: https://issues.apache.org/jira/browse/YARN-3850
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.1
>
> Attachments: YARN-3850.01.patch, YARN-3850.02.patch
>
>
> *Container logs* can be lost if disk has become full(~90% full).
> When application finishes, we upload logs after aggregation by calling 
> {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns 
> checks the eligible directories on call to 
> {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would 
> return nothing. So none of the container logs are aggregated and uploaded.
> But on application finish, we also call 
> {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
> application directory which contains container logs. This is because it calls 
> {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
> as well.
> So we are left with neither aggregated logs for the app nor the individual 
> container logs for the app.
> In addition to this, there are 2 more issues :
> # {{ContainerLogsUtil#getContainerLogDirs}} does not consider full disks so 
> NM will fail to serve up logs from full disks from its web interfaces.
> # {{RecoveredContainerLaunch#locatePidFile}} also does not consider full 
> disks so it is possible that on container recovery, PID file is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues

2015-07-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3850:
--
Labels: 2.6.1-candidate  (was: )

 NM fails to read files from full disks which can lead to container logs being 
 lost and other issues
 ---

 Key: YARN-3850
 URL: https://issues.apache.org/jira/browse/YARN-3850
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: YARN-3850.01.patch, YARN-3850.02.patch


 *Container logs* can be lost if disk has become full(~90% full).
 When application finishes, we upload logs after aggregation by calling 
 {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns 
 checks the eligible directories on call to 
 {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would 
 return nothing. So none of the container logs are aggregated and uploaded.
 But on application finish, we also call 
 {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
 application directory which contains container logs. This is because it calls 
 {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
 as well.
 So we are left with neither aggregated logs for the app nor the individual 
 container logs for the app.
 In addition to this, there are 2 more issues :
 # {{ContainerLogsUtil#getContainerLogDirs}} does not consider full disks so 
 NM will fail to serve up logs from full disks from its web interfaces.
 # {{RecoveredContainerLaunch#locatePidFile}} also does not consider full 
 disks so it is possible that on container recovery, PID file is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues

2015-06-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3850:
---
Component/s: nodemanager

 NM fails to read files from full disks which can lead to container logs being 
 lost and other issues
 ---

 Key: YARN-3850
 URL: https://issues.apache.org/jira/browse/YARN-3850
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-3850.01.patch, YARN-3850.02.patch


 *Container logs* can be lost if disk has become bad(become 90% full).
 When application finishes, we upload logs after aggregation by calling 
 {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns 
 checks the eligible directories on call to 
 {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would 
 return nothing. So none of the container logs are aggregated and uploaded.
 But on application finish, we also call 
 {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
 application directory which contains container logs. This is because it calls 
 {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
 as well.
 So we are left with neither aggregated logs for the app nor the individual 
 container logs for the app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues

2015-06-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3850:
---
Summary: NM fails to read files from full disks which can lead to container 
logs being lost and other issues  (was: Container logs can be lost if disk is 
full)

 NM fails to read files from full disks which can lead to container logs being 
 lost and other issues
 ---

 Key: YARN-3850
 URL: https://issues.apache.org/jira/browse/YARN-3850
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-3850.01.patch, YARN-3850.02.patch


 *Container logs* can be lost if disk has become bad(become 90% full).
 When application finishes, we upload logs after aggregation by calling 
 {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns 
 checks the eligible directories on call to 
 {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would 
 return nothing. So none of the container logs are aggregated and uploaded.
 But on application finish, we also call 
 {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
 application directory which contains container logs. This is because it calls 
 {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
 as well.
 So we are left with neither aggregated logs for the app nor the individual 
 container logs for the app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues

2015-06-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3850:
---
Description: 
*Container logs* can be lost if disk has become bad(become 90% full).
When application finishes, we upload logs after aggregation by calling 
{{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns checks 
the eligible directories on call to {{LocalDirsHandlerService#getLogDirs}} 
which in case of disk full would return nothing. So none of the container logs 
are aggregated and uploaded.
But on application finish, we also call 
{{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
application directory which contains container logs. This is because it calls 
{{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
as well.
So we are left with neither aggregated logs for the app nor the individual 
container logs for the app.

In addition to this, there are 2 more issues :
# {{ContainerLogsUtil#getContainerLogDirs}} does not consider full disks so NM 
will fail to serve up logs from full disks from its web interfaces.
# {{RecoveredContainerLaunch#locatePidFile}} also does not consider full disks 
so it is possible that on container recovery, PID file is not found.

  was:
*Container logs* can be lost if disk has become bad(become 90% full).
When application finishes, we upload logs after aggregation by calling 
{{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns checks 
the eligible directories on call to {{LocalDirsHandlerService#getLogDirs}} 
which in case of disk full would return nothing. So none of the container logs 
are aggregated and uploaded.
But on application finish, we also call 
{{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
application directory which contains container logs. This is because it calls 
{{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
as well.
So we are left with neither aggregated logs for the app nor the individual 
container logs for the app.

In addition to this, there are 2 more issues :
# {{ContainerLogsUtil#getContainerLogDirs}} also does not considers full disks 
so NM will fail to serve up logs from full disks from its web interfaces.
# {{RecoveredContainerLaunch#locatePidFile}} also does not consider full disks 
so it is possible that on container recovery, PID file is not found.


 NM fails to read files from full disks which can lead to container logs being 
 lost and other issues
 ---

 Key: YARN-3850
 URL: https://issues.apache.org/jira/browse/YARN-3850
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-3850.01.patch, YARN-3850.02.patch


 *Container logs* can be lost if disk has become bad(become 90% full).
 When application finishes, we upload logs after aggregation by calling 
 {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns 
 checks the eligible directories on call to 
 {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would 
 return nothing. So none of the container logs are aggregated and uploaded.
 But on application finish, we also call 
 {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
 application directory which contains container logs. This is because it calls 
 {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
 as well.
 So we are left with neither aggregated logs for the app nor the individual 
 container logs for the app.
 In addition to this, there are 2 more issues :
 # {{ContainerLogsUtil#getContainerLogDirs}} does not consider full disks so 
 NM will fail to serve up logs from full disks from its web interfaces.
 # {{RecoveredContainerLaunch#locatePidFile}} also does not consider full 
 disks so it is possible that on container recovery, PID file is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues

2015-06-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3850:
---
Description: 
*Container logs* can be lost if disk has become bad(become 90% full).
When application finishes, we upload logs after aggregation by calling 
{{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns checks 
the eligible directories on call to {{LocalDirsHandlerService#getLogDirs}} 
which in case of disk full would return nothing. So none of the container logs 
are aggregated and uploaded.
But on application finish, we also call 
{{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
application directory which contains container logs. This is because it calls 
{{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
as well.
So we are left with neither aggregated logs for the app nor the individual 
container logs for the app.

In addition to this, there are 2 more issues :
# {{ContainerLogsUtil#getContainerLogDirs}} also does not considers full disks 
so NM will fail to serve up logs from full disks from its web interfaces.
# {{RecoveredContainerLaunch#locatePidFile}} also does not consider full disks 
so it is possible that on container recovery, PID file is not found.

  was:
*Container logs* can be lost if disk has become bad(become 90% full).
When application finishes, we upload logs after aggregation by calling 
{{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns checks 
the eligible directories on call to {{LocalDirsHandlerService#getLogDirs}} 
which in case of disk full would return nothing. So none of the container logs 
are aggregated and uploaded.
But on application finish, we also call 
{{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
application directory which contains container logs. This is because it calls 
{{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
as well.
So we are left with neither aggregated logs for the app nor the individual 
container logs for the app.

In addition to this, there are 2 more issues :
# {{ContainerLogsUtil#getContainerLogDirs}} also does not considers full disks 
so NM will fail to serve up logs from full disks from its web interfaces.
# 


 NM fails to read files from full disks which can lead to container logs being 
 lost and other issues
 ---

 Key: YARN-3850
 URL: https://issues.apache.org/jira/browse/YARN-3850
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-3850.01.patch, YARN-3850.02.patch


 *Container logs* can be lost if disk has become bad(become 90% full).
 When application finishes, we upload logs after aggregation by calling 
 {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns 
 checks the eligible directories on call to 
 {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would 
 return nothing. So none of the container logs are aggregated and uploaded.
 But on application finish, we also call 
 {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
 application directory which contains container logs. This is because it calls 
 {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
 as well.
 So we are left with neither aggregated logs for the app nor the individual 
 container logs for the app.
 In addition to this, there are 2 more issues :
 # {{ContainerLogsUtil#getContainerLogDirs}} also does not considers full 
 disks so NM will fail to serve up logs from full disks from its web 
 interfaces.
 # {{RecoveredContainerLaunch#locatePidFile}} also does not consider full 
 disks so it is possible that on container recovery, PID file is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues

2015-06-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3850:
---
Description: 
*Container logs* can be lost if disk has become bad(become 90% full).
When application finishes, we upload logs after aggregation by calling 
{{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns checks 
the eligible directories on call to {{LocalDirsHandlerService#getLogDirs}} 
which in case of disk full would return nothing. So none of the container logs 
are aggregated and uploaded.
But on application finish, we also call 
{{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
application directory which contains container logs. This is because it calls 
{{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
as well.
So we are left with neither aggregated logs for the app nor the individual 
container logs for the app.

In addition to this, there are 2 more issues :
# {{ContainerLogsUtil#getContainerLogDirs}} also does not considers full disks 
so NM will fail to serve up logs from full disks from its web interfaces.
# 

  was:
*Container logs* can be lost if disk has become bad(become 90% full).
When application finishes, we upload logs after aggregation by calling 
{{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns checks 
the eligible directories on call to {{LocalDirsHandlerService#getLogDirs}} 
which in case of disk full would return nothing. So none of the container logs 
are aggregated and uploaded.
But on application finish, we also call 
{{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
application directory which contains container logs. This is because it calls 
{{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
as well.
So we are left with neither aggregated logs for the app nor the individual 
container logs for the app.


 NM fails to read files from full disks which can lead to container logs being 
 lost and other issues
 ---

 Key: YARN-3850
 URL: https://issues.apache.org/jira/browse/YARN-3850
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-3850.01.patch, YARN-3850.02.patch


 *Container logs* can be lost if disk has become bad(become 90% full).
 When application finishes, we upload logs after aggregation by calling 
 {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns 
 checks the eligible directories on call to 
 {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would 
 return nothing. So none of the container logs are aggregated and uploaded.
 But on application finish, we also call 
 {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
 application directory which contains container logs. This is because it calls 
 {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
 as well.
 So we are left with neither aggregated logs for the app nor the individual 
 container logs for the app.
 In addition to this, there are 2 more issues :
 # {{ContainerLogsUtil#getContainerLogDirs}} also does not considers full 
 disks so NM will fail to serve up logs from full disks from its web 
 interfaces.
 # 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3850) NM fails to read files from full disks which can lead to container logs being lost and other issues

2015-06-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3850:
---
Description: 
*Container logs* can be lost if disk has become full(~90% full).
When application finishes, we upload logs after aggregation by calling 
{{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns checks 
the eligible directories on call to {{LocalDirsHandlerService#getLogDirs}} 
which in case of disk full would return nothing. So none of the container logs 
are aggregated and uploaded.
But on application finish, we also call 
{{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
application directory which contains container logs. This is because it calls 
{{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
as well.
So we are left with neither aggregated logs for the app nor the individual 
container logs for the app.

In addition to this, there are 2 more issues :
# {{ContainerLogsUtil#getContainerLogDirs}} does not consider full disks so NM 
will fail to serve up logs from full disks from its web interfaces.
# {{RecoveredContainerLaunch#locatePidFile}} also does not consider full disks 
so it is possible that on container recovery, PID file is not found.

  was:
*Container logs* can be lost if disk has become bad(become 90% full).
When application finishes, we upload logs after aggregation by calling 
{{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns checks 
the eligible directories on call to {{LocalDirsHandlerService#getLogDirs}} 
which in case of disk full would return nothing. So none of the container logs 
are aggregated and uploaded.
But on application finish, we also call 
{{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
application directory which contains container logs. This is because it calls 
{{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
as well.
So we are left with neither aggregated logs for the app nor the individual 
container logs for the app.

In addition to this, there are 2 more issues :
# {{ContainerLogsUtil#getContainerLogDirs}} does not consider full disks so NM 
will fail to serve up logs from full disks from its web interfaces.
# {{RecoveredContainerLaunch#locatePidFile}} also does not consider full disks 
so it is possible that on container recovery, PID file is not found.


 NM fails to read files from full disks which can lead to container logs being 
 lost and other issues
 ---

 Key: YARN-3850
 URL: https://issues.apache.org/jira/browse/YARN-3850
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation, nodemanager
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
Priority: Blocker
 Attachments: YARN-3850.01.patch, YARN-3850.02.patch


 *Container logs* can be lost if disk has become full(~90% full).
 When application finishes, we upload logs after aggregation by calling 
 {{AppLogAggregatorImpl#uploadLogsForContainers}}. But this call in turns 
 checks the eligible directories on call to 
 {{LocalDirsHandlerService#getLogDirs}} which in case of disk full would 
 return nothing. So none of the container logs are aggregated and uploaded.
 But on application finish, we also call 
 {{AppLogAggregatorImpl#doAppLogAggregationPostCleanUp()}}. This deletes the 
 application directory which contains container logs. This is because it calls 
 {{LocalDirsHandlerService#getLogDirsForCleanup}} which returns the full disks 
 as well.
 So we are left with neither aggregated logs for the app nor the individual 
 container logs for the app.
 In addition to this, there are 2 more issues :
 # {{ContainerLogsUtil#getContainerLogDirs}} does not consider full disks so 
 NM will fail to serve up logs from full disks from its web interfaces.
 # {{RecoveredContainerLaunch#locatePidFile}} also does not consider full 
 disks so it is possible that on container recovery, PID file is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)