[ https://issues.apache.org/jira/browse/YARN-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nathan Roberts updated YARN-4964: --------------------------------- Attachment: YARN-4964.001.patch > Allow ShuffleHandler readahead without drop-behind > -------------------------------------------------- > > Key: YARN-4964 > URL: https://issues.apache.org/jira/browse/YARN-4964 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Affects Versions: 3.0.0, 2.7.2 > Reporter: Nathan Roberts > Assignee: Nathan Roberts > Attachments: YARN-4964.001.patch > > > Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead > (POSIX_FADV_WILLNEED) and drop-behind (POSIX_FADV_DONTNEED) logic within the > ShuffleHandler. > It would be beneficial if these were separately configurable. > - Running without readahead can lead to significant seek storms caused by > large numbers of sendfiles() competing with one another. > - However, running with drop-behind can also lead to seek storms because > there are cases where the server can successfully write the shuffle bytes to > the network, BUT the client doesn't want the bytes right now (MergeManager > wants to WAIT is an example) so it ignores them and asks for them again a bit > later. This causes repeated reads of the same data from disk. > I'll attach a simple patch that enables/disables readahead based on > mapreduce.shuffle.readahead.bytes==0, leaving > mapreduce.shuffle.manage.os.cache controlling only the drop-behind. -- This message was sent by Atlassian JIRA (v6.3.4#6332)