[ https://issues.apache.org/jira/browse/YARN-8872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648233#comment-16648233 ]
Peter Bacsko commented on YARN-8872: ------------------------------------ Thanks for the patch [~mi...@cloudera.com]! To me it looks good. One thing though, I believe this should be a MAPREDUCE-nnnn JIRA, because JHS is an MR component. However, I don't have the permissions to change the type of the ticket, maybe you have? Or perhaps [~haibochen] can do it. > Optimize collections used by Yarn JHS to reduce its memory > ---------------------------------------------------------- > > Key: YARN-8872 > URL: https://issues.apache.org/jira/browse/YARN-8872 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn > Reporter: Misha Dmitriev > Assignee: Misha Dmitriev > Priority: Major > Attachments: YARN-8872.01.patch, jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org