[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM
[ https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117073#comment-14117073 ] Zhijie Shen commented on YARN-611: -- [~xgong], thanks for working on this issue, and I have a couple of comments upon the latest solution. 1. *API Change*: I'm not sure whether it is really necessary to have a completely standalone proto messages for ApplicationRetryPolicy's implementations. It sounds an overkill to me. In fact, MaxApplicationRetriesPolicy seems to be a special case of WindowedApplicationRetriesPolicy, where the window size is to be infinitely large, such that the number of failures will never be reset. Therefore, why not simply adding one more field (i.e., resetTimeWindow) in ApplicationSubmissionContext. When resetTimeWindow = 0 or -1, it means the window size is unbounded, and failure number will never be reset. On the other side, when resetTimeWindow is set to 0, the failure number will no take the failures which happen out of the window into account. Moreover, a minor issue here is that ApplicationRetryPolicy is actually not a real abstraction, which has the flags of both implementations's context. 2. *Failure Window*: If I understood correctly, WindowedApplicationRetriesPolicy uses a jumped window instead of a *moving* window. It may be problematic. Here's the example. Let's say the window size is 2H, and the maxAttempts is 100. From 0:00 to 1:00, there happened 1 failure. From 1:00 to 2:00, there happened 98 failures. At 2:00 the reset logic is triggered, such that all the 99 failures won't be taken into account any more. From 2:00 to 3:00, there happened 2 failures. The total failures at this time is 2, because the previous 99 failures have been reset. However, from the point of view at 3:00, looking back to the 2H window, 101 failures have happened. In fact, the job should run out of retry quotas at this point. IMHO, the reasonable way is to make use a certain data structure (e.g., fixed-size FIFO queue) to always keep tracking the number failures that happened in past configured time window, and update the data structure upon a failure happens. 3. *Multi-threading*: I'm not sure whether it is going to work for a big cluster with hundreds of even thousands concurrent applications to have an individual thread to reset the failure number. Though WindowedApplicationRetriesPolicy is particularly designed for the long running services, I don't think we have restricted the normal application to use it, and it's not reasonable to make this restriction. Therefore, it's likely to have that many threads for an RM if all apps choose to use this policy. However, AFAIK, the number of threads in a process is limited. Importantly, the reset logic is not computation intensive, such that it's wasting thread resources to have one for each app. Maybe we can make use a thread pool, or even have a single thread (e.g., a service of RM) to take care of all the apps' reset windows. Moreover, IMHO, if the aforementioned data structure is defined properly, we may not need to have a separate thread to the reset work, as the failure number in the past time of the configured window size is updated every time the failure happens. 4. *Affecting RMStateStore*: I'm not sure why it is necessary to persist the end time into RMStateStore, which seems not to be really used for reseting the window. One think I can image about RM restarting is how to store the failure number in the past time of the configured window size, if we want to make sure after RM restarting, RM is still able to trace back to the whole past time window for the failure number. But I think we can do it separately. Add an AM retry count reset window to YARN RM - Key: YARN-611 URL: https://issues.apache.org/jira/browse/YARN-611 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Chris Riccomini Assignee: Xuan Gong Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch YARN currently has the following config: yarn.resourcemanager.am.max-retries This config defaults to 2, and defines how many times to retry a failed AM before failing the whole YARN job. YARN counts an AM as failed if the node that it was running on dies (the NM will timeout, which counts as a failure for the AM), or if the AM dies. This configuration is insufficient for long running (or infinitely running) YARN jobs, since the machine (or NM) that the AM is running on will eventually need to be restarted (or the machine/NM will fail). In such an event, the AM has not done anything wrong, but this is counted as a failure by the RM. Since
[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM
[ https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117080#comment-14117080 ] Hadoop QA commented on YARN-611: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12663467/YARN-611.5.patch against trunk revision 258c7d0. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4795//console This message is automatically generated. Add an AM retry count reset window to YARN RM - Key: YARN-611 URL: https://issues.apache.org/jira/browse/YARN-611 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.3-alpha Reporter: Chris Riccomini Assignee: Xuan Gong Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch YARN currently has the following config: yarn.resourcemanager.am.max-retries This config defaults to 2, and defines how many times to retry a failed AM before failing the whole YARN job. YARN counts an AM as failed if the node that it was running on dies (the NM will timeout, which counts as a failure for the AM), or if the AM dies. This configuration is insufficient for long running (or infinitely running) YARN jobs, since the machine (or NM) that the AM is running on will eventually need to be restarted (or the machine/NM will fail). In such an event, the AM has not done anything wrong, but this is counted as a failure by the RM. Since the retry count for the AM is never reset, eventually, at some point, the number of machine/NM failures will result in the AM failure count going above the configured value for yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the job as failed, and shut it down. This behavior is not ideal. I propose that we add a second configuration: yarn.resourcemanager.am.retry-count-window-ms This configuration would define a window of time that would define when an AM is well behaved, and it's safe to reset its failure count back to zero. Every time an AM fails the RmAppImpl would check the last time that the AM failed. If the last failure was less than retry-count-window-ms ago, and the new failure count is max-retries, then the job should fail. If the AM has never failed, the retry count is max-retries, or if the last failure was OUTSIDE the retry-count-window-ms, then the job should be restarted. Additionally, if the last failure was outside the retry-count-window-ms, then the failure count should be set back to 0. This would give developers a way to have well-behaved AMs run forever, while still failing mis-behaving AMs after a short period of time. I think the work to be done here is to change the RmAppImpl to actually look at app.attempts, and see if there have been more than max-retries failures in the last retry-count-window-ms milliseconds. If there have, then the job should fail, if not, then the job should go forward. Additionally, we might also need to add an endTime in either RMAppAttemptImpl or RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the failure. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor
[ https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117143#comment-14117143 ] Remus Rusanu commented on YARN-2198: 1. nativeio.c: Should we return null here? RR: Fixed 2.Nit: nativeio code uses different naming convention for local variables. Please try to be consistent with the rest of the file. RR: Fixed 3. nativeio.c: Nit: I would move throw_ioe if check before done:, the code flow will be less error prone RR: fixed 4. winutils_process_stub.c: Can {{env-NewGlobalRef())) return null/throw? Should we handle this? RR: Fixed 5. winutils_process_stub.c: You should properly handle the GetExitCodeProcess() failure case. RR: fixed 6. winutils_process_stub.c:Init to INVALID_HANDLE_VALUE? RR: Fixed 7. client.c: Are RPC_STATUS error codes compatible with winerror codes? (semantic around checking for error) RR: From my experiments they are compatible. FormatMessage gets the right message for RPC statuses 8. config.cpp: Wondering if there is a way to get to config files without adding a dependency on env variables? RR: config location is now ../etc/hadoop/wsce-site.xml relative to exe. It is defined in pom.xml 9. config.cpp: This error check is unintuitive. Can you please be more explicit? RR: fixed (no longer applies because only one file is checked) 10. config.cpp: Are SAL annotations correct? For strings one would usually use __out_ecount()? RR: Fixed, and it was broken all over, thanks for catching it 11. config.cpp: SAL annotation __out_bcount? Also outLen-len in the annotation. RR: fixed 11. config.cpp: This should be before StringCbPrintf to guarantee that CoInit and CoUninit are balanced. RR: fixed 12. hdpwinutilsvc.idl: Name does not seem appropriate for apache... possibly name it just winutilsvc.idl. Should we use spaces in this file for consistency? RR: fixes all names as hadoopwinutilsvc 13. winutils.h:__in_bcount(len) - __in_ecount(len) RR: fixed 14. libwinutils.c: I'm wondering if this is good opportunity to introduce unittests for our C code, as the complexity started increasing beyond just windows OS calls, where there is little value in unittesting. RR: Not fixed. I will come back later and add units here, but the core work (LRPC, SCM, logon user and create process) are basically untestable from C unit test. 15. libwinutils.c: Should we deallocate this when BuildSecurityDescriptor fails? RR: is alloca, so it doesn't need dealloc. I don't think it is required to do this now, just wanted to bring it up: if our native codebase continues to grow at this pace we should consider introducing smart pointers. It is becoming impossibly hard to properly manage the memory in all success/failure cases. This becomes more important now that we have long running NM native client and winutils service. RR: the whole winutils/libwinutils code style is early 90's Petzold Windows code style. I'm not a fan of it, but I kept all new code consistent with this style. Moving to C++ RAI would be better, but I don;t want to do it piecemeal. Some other time. 16. What is the behaviour of calling winutils service. Will this command install and start a winutils.exe service under SYSTEM account, and exit? RR: no. SCM instalation/config is left to SCM tools (eg. sc.exe). winutils service is the command line to start the service (it starts, register entry point with SCM, waits for SCM commands). Remove the need to run NodeManager as privileged account for Windows Secure Container Executor -- Key: YARN-2198 URL: https://issues.apache.org/jira/browse/YARN-2198 Project: Hadoop YARN Issue Type: Improvement Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.separation.patch YARN-1972 introduces a Secure Windows Container Executor. However this executor requires a the process launching the container to be LocalSystem or a member of the a local Administrators group. Since the process in question is the NodeManager, the requirement translates to the entire NM to run as a privileged account, a very large surface area to review and protect. This proposal is to move the privileged operations into a dedicated NT service. The NM can run as a low privilege account and communicate with the privileged NT service when it needs to launch a container. This would reduce the surface exposed to the high privileges. There has to exist a secure, authenticated and authorized channel of communication between the NM and the privileged NT service. Possible alternatives are a new TCP endpoint, Java RPC etc. My proposal though would be to use Windows LPC
[jira] [Created] (YARN-2485) Fix WSCE folder/file/classpathJar permission/order when running as non-admin
Remus Rusanu created YARN-2485: -- Summary: Fix WSCE folder/file/classpathJar permission/order when running as non-admin Key: YARN-2485 URL: https://issues.apache.org/jira/browse/YARN-2485 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu The WSCE creates the local, usercache, filecache appcache dirs in the normal DefaultContainerExecutor way, and then assigns ownership to the userprocess. The WSCE configured group is added, but the permission masks used (710) do no give write permissions on the appcache/filecache/usercache folder to the NM itself. The creation of these folders, as well as the creation of the temporary classPath jar files must succeed even after thes file/dir ownership is relinquished to the task user and the NM does not run as a local Administrator. LCE handles all these dirs inside the container-executor app (root). The classpathJar issue does not exists on Linux. The dirs can be handled by simply delaying the transfer (create all dirs and temp files, then assign ownership in bulk) but the task classpathJar is 'special' and needs some refactoring of the NM launch sequence. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117327#comment-14117327 ] Tsuyoshi OZAWA commented on YARN-1879: -- A latest patch is ready for review. I also consider we can separate RetryCache support on separate JIRA to meet a deadline of 2.6 release. What do you think? Please let me know if I should do so. Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol --- Key: YARN-1879 URL: https://issues.apache.org/jira/browse/YARN-1879 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Tsuyoshi OZAWA Priority: Critical Attachments: YARN-1879.1.patch, YARN-1879.1.patch, YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117458#comment-14117458 ] Junping Du commented on YARN-2033: -- +1. Latest patch LGTM. Will commit it tomorrow if no new comments from others. Investigate merging generic-history into the Timeline Store --- Key: YARN-2033 URL: https://issues.apache.org/jira/browse/YARN-2033 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch Having two different stores isn't amicable to generic insights on what's happening with applications. This is to investigate porting generic-history into the Timeline Store. One goal is to try and retain most of the client side interfaces as close to what we have today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register
[ https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117506#comment-14117506 ] Varun Vasudev commented on YARN-2448: - [~sandyr], [~kasha] thanks for your extremely helpful input. I think what [~sandyr] is suggesting should be ok. Is it ok to generalize it to return a representation of the resource types that the scheduler considers as part of its functioning? So that in the future if we add support for more resource types, we don't have to change much? RM should expose the name of the ResourceCalculator being used when AMs register Key: YARN-2448 URL: https://issues.apache.org/jira/browse/YARN-2448 Project: Hadoop YARN Issue Type: Improvement Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch The RM should expose the name of the ResourceCalculator being used when AMs register, as part of the RegisterApplicationMasterResponse. This will allow applications to make better decisions when scheduling. MapReduce for example, only looks at memory when deciding it's scheduling, even though the RM could potentially be using the DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2486) FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps
Swapnil Daingade created YARN-2486: -- Summary: FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps Key: YARN-2486 URL: https://issues.apache.org/jira/browse/YARN-2486 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Reporter: Swapnil Daingade Priority: Minor The org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData class defines readOps, largeReadOps, writeOps as int. Also the The org.apache.hadoop.fs.FileSystem.Statistics class has methods like getReadOps(), getLargeReadOps() and getWriteOps() that return int. These int values can overflow if the exceed 2^31-1 showing negative values. It would be nice if these can be changed to long. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2486) FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps
[ https://issues.apache.org/jira/browse/YARN-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117741#comment-14117741 ] Gary Steelman commented on YARN-2486: - I'd really like to see these as long types instead of int, thanks for reporting! Are there other places where counters are int types where we should change them to long types? FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps Key: YARN-2486 URL: https://issues.apache.org/jira/browse/YARN-2486 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.1 Reporter: Swapnil Daingade Priority: Minor The org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData class defines readOps, largeReadOps, writeOps as int. Also the The org.apache.hadoop.fs.FileSystem.Statistics class has methods like getReadOps(), getLargeReadOps() and getWriteOps() that return int. These int values can overflow if the exceed 2^31-1 showing negative values. It would be nice if these can be changed to long. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2486) FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps
[ https://issues.apache.org/jira/browse/YARN-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117800#comment-14117800 ] Sandy Ryza commented on YARN-2486: -- Unfortunately these methods were made public in 2.5, so we can't change their signatures. We can, however, add versions with new names that return longs. FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps Key: YARN-2486 URL: https://issues.apache.org/jira/browse/YARN-2486 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.0, 2.4.1 Reporter: Swapnil Daingade Priority: Minor The org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData class defines readOps, largeReadOps, writeOps as int. Also the The org.apache.hadoop.fs.FileSystem.Statistics class has methods like getReadOps(), getLargeReadOps() and getWriteOps() that return int. These int values can overflow if the exceed 2^31-1 showing negative values. It would be nice if these can be changed to long. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2487) Need to support timeout of AM When no containers are assigned to it for a defined period
Naganarasimha G R created YARN-2487: --- Summary: Need to support timeout of AM When no containers are assigned to it for a defined period Key: YARN-2487 URL: https://issues.apache.org/jira/browse/YARN-2487 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R There are some scenarios where AM will not get containers and indefinetely waiting. We faced one such sceanrio which makes the applications to get hung : Consider a cluster setup which has 2 NMS of each 8GB resource, And 2 applications are launched in the default queue where in each AM is taking 2 GB each. Each AM is placed in each of the NM. Now each AM is requesting for container of 7Gb mem resource . As in each NM only 6GB resource is available both the applications are hung forever. To avoid such scenarios i would to propose generic timeout feature for all AM's @ the yarn side such that if no containers are assigned for an application for a defined period than yarn can timeout the application attempt. Default can be set to 0 where in RM will not timeout the app attempt and user can set his own timeout when he submits the application -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2487) Need to support timeout of AM When no containers are assigned to it for a defined period
[ https://issues.apache.org/jira/browse/YARN-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2487: Description: There are some scenarios where AM will not get containers and indefinitely waiting. We faced one such sceanrio which makes the applications to get hung : Consider a cluster setup which has 2 NMS of each 8GB resource, And 2 applications(MR2) are launched in the default queue where in each AM is taking 2 GB each. Each AM is placed in each of the NM. Now each AM is requesting for container of 7Gb mem resource . As in each NM only 6GB resource is available both the applications are hung forever. To avoid such scenarios i would like to propose generic timeout feature for all AM's in yarn, such that if no containers are assigned for an application for a defined period than yarn can timeout the application attempt. Default can be set to 0 where in RM will not timeout the app attempt and user can set his own timeout when he submits the application was: There are some scenarios where AM will not get containers and indefinetely waiting. We faced one such sceanrio which makes the applications to get hung : Consider a cluster setup which has 2 NMS of each 8GB resource, And 2 applications are launched in the default queue where in each AM is taking 2 GB each. Each AM is placed in each of the NM. Now each AM is requesting for container of 7Gb mem resource . As in each NM only 6GB resource is available both the applications are hung forever. To avoid such scenarios i would to propose generic timeout feature for all AM's @ the yarn side such that if no containers are assigned for an application for a defined period than yarn can timeout the application attempt. Default can be set to 0 where in RM will not timeout the app attempt and user can set his own timeout when he submits the application Need to support timeout of AM When no containers are assigned to it for a defined period Key: YARN-2487 URL: https://issues.apache.org/jira/browse/YARN-2487 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R There are some scenarios where AM will not get containers and indefinitely waiting. We faced one such sceanrio which makes the applications to get hung : Consider a cluster setup which has 2 NMS of each 8GB resource, And 2 applications(MR2) are launched in the default queue where in each AM is taking 2 GB each. Each AM is placed in each of the NM. Now each AM is requesting for container of 7Gb mem resource . As in each NM only 6GB resource is available both the applications are hung forever. To avoid such scenarios i would like to propose generic timeout feature for all AM's in yarn, such that if no containers are assigned for an application for a defined period than yarn can timeout the application attempt. Default can be set to 0 where in RM will not timeout the app attempt and user can set his own timeout when he submits the application -- This message was sent by Atlassian JIRA (v6.3.4#6332)