[jira] [Updated] (YARN-8872) Optimize collections used by Yarn JHS to reduce its memory

2018-10-12 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated YARN-8872:
-
Attachment: YARN-8872.02.patch

> Optimize collections used by Yarn JHS to reduce its memory
> --
>
> Key: YARN-8872
> URL: https://issues.apache.org/jira/browse/YARN-8872
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: YARN-8872.01.patch, YARN-8872.02.patch, 
> jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8872) Optimize collections used by Yarn JHS to reduce its memory

2018-10-11 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated YARN-8872:
-
Attachment: YARN-8872.01.patch

> Optimize collections used by Yarn JHS to reduce its memory
> --
>
> Key: YARN-8872
> URL: https://issues.apache.org/jira/browse/YARN-8872
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: YARN-8872.01.patch, jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8872) Optimize collections used by Yarn JHS to reduce its memory

2018-10-11 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated YARN-8872:
-
Description: 
We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
heap in a large clusters, handling large MapReduce jobs. The heap is large 
(over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
collections, mostly maps and lists that are either empty or contain only one 
element. In such under-populated collections considerable amount of memory is 
still used by just the internal implementation objects. See the attached 
excerpt from the jxray report for the details. If certain collections are 
almost always empty, they should be initialized lazily. If others almost always 
have just 1 or 2 elements, they should be initialized with the appropriate 
initial capacity of 1 or 2 (the default capacity is 16 for HashMap and 10 for 
ArrayList).

Based on the attached report, we should do the following:
 # {{FileSystemCounterGroup.map}} - initialize lazily
 # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
only have one or two attempts
 # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
 # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
contains one diagnostic message most of the time
 # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to use 
the more wasteful LinkedList here) and initialize with capacity 1.

  was:
We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
heap in a large clusters, handling large MapReduce jobs. The heap is large 
(over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
collections, mostly maps and lists that are either empty or contain only one 
element. In such under-populated collections considerable amount of memory is 
still used by just the internal implementation objects. See the attached 
excerpt from the jxray report for the details. If certain collections are 
almost always empty, they should be initialized lazily. If others almost always 
have just 1 or 2 elements, they should be initialized with the appropriate 
initial capacity, which is much smaller than e.g. the default 16 for HashMap 
and 10 for ArrayList.

Based on the attached report, we should do the following:
 # {{FileSystemCounterGroup.map}} - initialize lazily
 # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
only have one or two attempts
 # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
 # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
contains one diagnostic message most of the time
 # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to use 
the more wasteful LinkedList here) and initialize with capacity 1.


> Optimize collections used by Yarn JHS to reduce its memory
> --
>
> Key: YARN-8872
> URL: https://issues.apache.org/jira/browse/YARN-8872
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Updated] (YARN-8872) Optimize collections used by Yarn JHS to reduce its memory

2018-10-11 Thread Misha Dmitriev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated YARN-8872:
-
Description: 
We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
heap in a large clusters, handling large MapReduce jobs. The heap is large 
(over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
collections, mostly maps and lists that are either empty or contain only one 
element. In such under-populated collections considerable amount of memory is 
still used by just the internal implementation objects. See the attached 
excerpt from the jxray report for the details. If certain collections are 
almost always empty, they should be initialized lazily. If others almost always 
have just 1 or 2 elements, they should be initialized with the appropriate 
initial capacity, which is much smaller than e.g. the default 16 for HashMap 
and 10 for ArrayList.

Based on the attached report, we should do the following:
 # {{FileSystemCounterGroup.map}} - initialize lazily
 # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
only have one or two attempts
 # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
 # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
contains one diagnostic message most of the time
 # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to use 
the more wasteful LinkedList here) and initialize with capacity 1.

  was:
We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
heap in a large clusters, handling large MapReduce jobs. The heap is large 
(over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
collections, mostly maps and lists that are either empty or contain only one 
element. In such under-populated collections considerable amount of memory is 
still used by just the internal implementation objects. See the attached 
excerpt from the jxray report for the details. If certain collections are 
almost always empty, they should be initialized lazily. If others almost always 
have just 1 or 2 elements, they should be initialized with the appropriate 
initial capacity, which is much smaller than e.g. the default 16 for HashMap 
and 10 for ArrayList.

Based on the attached report, we should do the following:
 # {{FileSystemCounterGroup.map}} - initialize lazily
 # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
only have one or two attempts
 # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity 2

 # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
contains one diagnostic message most of the time.
 # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to use 
the more wasteful LinkedList here) and initialize with capacity 1.


> Optimize collections used by Yarn JHS to reduce its memory
> --
>
> Key: YARN-8872
> URL: https://issues.apache.org/jira/browse/YARN-8872
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity, which is much smaller than e.g. the default 16 
> for HashMap and 10 for ArrayList.
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: