date:20140901

[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM

2014-09-01 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117073#comment-14117073
]

Zhijie Shen commented on YARN-611:
--

[~xgong], thanks for working on this issue, and I have a couple of comments
upon the latest solution.

1. *API Change*: I'm not sure whether it is really necessary to have a
completely standalone proto messages for ApplicationRetryPolicy's
implementations. It sounds an overkill to me. In fact,
MaxApplicationRetriesPolicy seems to be a special case of
WindowedApplicationRetriesPolicy, where the window size is to be infinitely
large, such that the number of failures will never be reset. Therefore, why not
simply adding one more field (i.e., resetTimeWindow) in
ApplicationSubmissionContext. When resetTimeWindow = 0 or -1, it means the
window size is unbounded, and failure number will never be reset. On the other
side, when resetTimeWindow is set to 0, the failure number will no take the
failures which happen out of the window into account.

Moreover, a minor issue here is that ApplicationRetryPolicy is actually not a
real abstraction, which has the flags of both implementations's context.

2. *Failure Window*: If I understood correctly,
WindowedApplicationRetriesPolicy uses a jumped window instead of a *moving*
window. It may be problematic. Here's the example. Let's say the window size is
2H, and the maxAttempts is 100. From 0:00 to 1:00, there happened 1 failure.
From 1:00 to 2:00, there happened 98 failures. At 2:00 the reset logic is
triggered, such that all the 99 failures won't be taken into account any more.
From 2:00 to 3:00, there happened 2 failures. The total failures at this time
is 2, because the previous 99 failures have been reset. However, from the point
of view at 3:00, looking back to the 2H window, 101 failures have happened. In
fact, the job should run out of retry quotas at this point.

IMHO, the reasonable way is to make use a certain data structure (e.g.,
fixed-size FIFO queue) to always keep tracking the number failures that
happened in past configured time window, and update the data structure upon a
failure happens.

3. *Multi-threading*: I'm not sure whether it is going to work for a big
cluster with hundreds of even thousands concurrent applications to have an
individual thread to reset the failure number. Though
WindowedApplicationRetriesPolicy is particularly designed for the long running
services, I don't think we have restricted the normal application to use it,
and it's not reasonable to make this restriction. Therefore, it's likely to
have that many threads for an RM if all apps choose to use this policy.
However, AFAIK, the number of threads in a process is limited. Importantly, the
reset logic is not computation intensive, such that it's wasting thread
resources to have one for each app.

Maybe we can make use a thread pool, or even have a single thread (e.g., a
service of RM) to take care of all the apps' reset windows. Moreover, IMHO, if
the aforementioned data structure is defined properly, we may not need to have
a separate thread to the reset work, as the failure number in the past time of
the configured window size is updated every time the failure happens.

4. *Affecting RMStateStore*: I'm not sure why it is necessary to persist the
end time into RMStateStore, which seems not to be really used for reseting
the window. One think I can image about RM restarting is how to store the
failure number in the past time of the configured window size, if we want to
make sure after RM restarting, RM is still able to trace back to the whole past
time window for the failure number. But I think we can do it separately.

Add an AM retry count reset window to YARN RM
-

Key: YARN-611
URL: https://issues.apache.org/jira/browse/YARN-611
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Chris Riccomini
Assignee: Xuan Gong
Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch,
YARN-611.4.patch, YARN-611.4.rebase.patch, YARN-611.5.patch

[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM

2014-09-01 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117080#comment-14117080
]

Hadoop QA commented on YARN-611:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12663467/YARN-611.5.patch
against trunk revision 258c7d0.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4795//console

This message is automatically generated.

Add an AM retry count reset window to YARN RM
-

YARN currently has the following config:
yarn.resourcemanager.am.max-retries
This config defaults to 2, and defines how many times to retry a failed AM
before failing the whole YARN job. YARN counts an AM as failed if the node
that it was running on dies (the NM will timeout, which counts as a failure
for the AM), or if the AM dies.
This configuration is insufficient for long running (or infinitely running)
YARN jobs, since the machine (or NM) that the AM is running on will
eventually need to be restarted (or the machine/NM will fail). In such an
event, the AM has not done anything wrong, but this is counted as a failure
by the RM. Since the retry count for the AM is never reset, eventually, at
some point, the number of machine/NM failures will result in the AM failure
count going above the configured value for
yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the
job as failed, and shut it down. This behavior is not ideal.
I propose that we add a second configuration:
yarn.resourcemanager.am.retry-count-window-ms
This configuration would define a window of time that would define when an AM
is well behaved, and it's safe to reset its failure count back to zero.
Every time an AM fails the RmAppImpl would check the last time that the AM
failed. If the last failure was less than retry-count-window-ms ago, and the
new failure count is max-retries, then the job should fail. If the AM has
never failed, the retry count is max-retries, or if the last failure was
OUTSIDE the retry-count-window-ms, then the job should be restarted.
Additionally, if the last failure was outside the retry-count-window-ms, then
the failure count should be set back to 0.
This would give developers a way to have well-behaved AMs run forever, while
still failing mis-behaving AMs after a short period of time.
I think the work to be done here is to change the RmAppImpl to actually look
at app.attempts, and see if there have been more than max-retries failures in
the last retry-count-window-ms milliseconds. If there have, then the job
should fail, if not, then the job should go forward. Additionally, we might
also need to add an endTime in either RMAppAttemptImpl or
RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the
failure.
Thoughts?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-09-01 Thread Remus Rusanu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117143#comment-14117143
]

Remus Rusanu commented on YARN-2198:

1. nativeio.c: Should we return null here?
RR: Fixed

2.Nit: nativeio code uses different naming convention for local variables.
Please try to be consistent with the rest of the file.
RR: Fixed

3. nativeio.c: Nit: I would move throw_ioe if check before done:, the code flow
will be less error prone
RR: fixed

4. winutils_process_stub.c: Can {{env-NewGlobalRef())) return null/throw?
Should we handle this?
RR: Fixed

5. winutils_process_stub.c: You should properly handle the GetExitCodeProcess()
failure case.
RR: fixed

6. winutils_process_stub.c:Init to INVALID_HANDLE_VALUE?
RR: Fixed

7. client.c: Are RPC_STATUS error codes compatible with winerror codes?
(semantic around checking for error)
RR: From my experiments they are compatible. FormatMessage gets the right
message for RPC statuses

8. config.cpp: Wondering if there is a way to get to config files without
adding a dependency on env variables?
RR: config location is now ../etc/hadoop/wsce-site.xml relative to exe. It is
defined in pom.xml

9. config.cpp: This error check is unintuitive. Can you please be more explicit?
RR: fixed (no longer applies because only one file is checked)

10. config.cpp: Are SAL annotations correct? For strings one would usually use
__out_ecount()?
RR: Fixed, and it was broken all over, thanks for catching it

11. config.cpp: SAL annotation __out_bcount? Also outLen-len in the annotation.
RR: fixed

11. config.cpp: This should be before StringCbPrintf to guarantee that CoInit
and CoUninit are balanced.
RR: fixed

12. hdpwinutilsvc.idl: Name does not seem appropriate for apache... possibly
name it just winutilsvc.idl. Should we use spaces in this file for consistency?
RR: fixes all names as hadoopwinutilsvc

13. winutils.h:__in_bcount(len) - __in_ecount(len)
RR: fixed

14. libwinutils.c: I'm wondering if this is good opportunity to introduce
unittests for our C code, as the complexity started increasing beyond just
windows OS calls, where there is little value in unittesting.
RR: Not fixed. I will come back later and add units here, but the core work
(LRPC, SCM, logon user and create process) are basically untestable from C unit
test.

15. libwinutils.c: Should we deallocate this when BuildSecurityDescriptor fails?
RR: is alloca, so it doesn't need dealloc.

I don't think it is required to do this now, just wanted to bring it up: if our
native codebase continues to grow at this pace we should consider introducing
smart pointers. It is becoming impossibly hard to properly manage the memory in
all success/failure cases. This becomes more important now that we have long
running NM native client and winutils service.
RR: the whole winutils/libwinutils code style is early 90's Petzold Windows
code style. I'm not a fan of it, but I kept all new code consistent with this
style. Moving to C++ RAI would be better, but I don;t want to do it piecemeal.
Some other time.

16. What is the behaviour of calling winutils service. Will this command
install and start a winutils.exe service under SYSTEM account, and exit?
RR: no. SCM instalation/config is left to SCM tools (eg. sc.exe). winutils
service is the command line to start the service (it starts, register entry
point with SCM, waits for SCM commands).

Remove the need to run NodeManager as privileged account for Windows Secure
Container Executor
--

Key: YARN-2198
URL: https://issues.apache.org/jira/browse/YARN-2198
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Labels: security, windows
Attachments: YARN-2198.1.patch, YARN-2198.2.patch, YARN-2198.3.patch,
YARN-2198.separation.patch

YARN-1972 introduces a Secure Windows Container Executor. However this
executor requires a the process launching the container to be LocalSystem or
a member of the a local Administrators group. Since the process in question
is the NodeManager, the requirement translates to the entire NM to run as a
privileged account, a very large surface area to review and protect.
This proposal is to move the privileged operations into a dedicated NT
service. The NM can run as a low privilege account and communicate with the
privileged NT service when it needs to launch a container. This would reduce
the surface exposed to the high privileges.
There has to exist a secure, authenticated and authorized channel of
communication between the NM and the privileged NT service. Possible
alternatives are a new TCP endpoint, Java RPC etc. My proposal though would
be to use Windows LPC

[jira] [Created] (YARN-2485) Fix WSCE folder/file/classpathJar permission/order when running as non-admin

2014-09-01 Thread Remus Rusanu (JIRA)

Remus Rusanu created YARN-2485:
--

 Summary: Fix WSCE folder/file/classpathJar permission/order when 
running as non-admin
 Key: YARN-2485
 URL: https://issues.apache.org/jira/browse/YARN-2485
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Remus Rusanu
Assignee: Remus Rusanu


The WSCE creates the local, usercache, filecache appcache dirs in the normal 
DefaultContainerExecutor way, and then assigns ownership to the userprocess. 
The WSCE configured group is added, but the permission masks used (710) do no 
give write permissions on the appcache/filecache/usercache folder to the NM 
itself.

The creation of these folders, as well as the creation of the temporary 
classPath jar files must succeed even after thes file/dir ownership is 
relinquished to the task user and the NM does not run as a local Administrator. 

LCE handles all these dirs inside the container-executor app (root). The 
classpathJar issue does not exists on Linux.

The dirs can be handled by simply delaying the transfer (create all dirs and 
temp files, then assign ownership in bulk) but the task classpathJar is 
'special' and needs some refactoring of the NM launch sequence.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-09-01 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117327#comment-14117327
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

A latest patch is ready for review. I also consider we can separate RetryCache 
support on separate JIRA to meet a deadline of 2.6 release. What do you think? 
Please let me know if I should do so.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.3.patch, 
 YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, 
 YARN-1879.8.patch, YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-09-01 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117458#comment-14117458
 ] 

Junping Du commented on YARN-2033:
--

+1. Latest patch LGTM. Will commit it tomorrow if no new comments from others.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
 YARN-2033.5.patch, YARN-2033.6.patch, YARN-2033.7.patch, 
 YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, 
 YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register

2014-09-01 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117506#comment-14117506
 ] 

Varun Vasudev commented on YARN-2448:
-

[~sandyr], [~kasha] thanks for your extremely helpful input. I think what 
[~sandyr] is suggesting should be ok. Is it ok to generalize it to return a 
representation of the resource types that the scheduler considers as part of 
its functioning? So that in the future if we add support for more resource 
types, we don't have to change much?

 RM should expose the name of the ResourceCalculator being used when AMs 
 register
 

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2486) FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps

2014-09-01 Thread Swapnil Daingade (JIRA)

Swapnil Daingade created YARN-2486:
--

 Summary: FileSystem counters can overflow for large number of 
readOps, largeReadOps, writeOps
 Key: YARN-2486
 URL: https://issues.apache.org/jira/browse/YARN-2486
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Swapnil Daingade
Priority: Minor


The org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData class defines 
readOps, largeReadOps, writeOps as int. Also the The 
org.apache.hadoop.fs.FileSystem.Statistics class has methods like getReadOps(), 
getLargeReadOps() and getWriteOps() that return int. These int values can 
overflow if the exceed 2^31-1 showing negative values. It would be nice if 
these can be changed to long.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2486) FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps

2014-09-01 Thread Gary Steelman (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117741#comment-14117741
 ] 

Gary Steelman commented on YARN-2486:
-

I'd really like to see these as long types instead of int, thanks for 
reporting! Are there other places where counters are int types where we should 
change them to long types?

 FileSystem counters can overflow for large number of readOps, largeReadOps, 
 writeOps
 

 Key: YARN-2486
 URL: https://issues.apache.org/jira/browse/YARN-2486
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Swapnil Daingade
Priority: Minor

 The org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData class defines 
 readOps, largeReadOps, writeOps as int. Also the The 
 org.apache.hadoop.fs.FileSystem.Statistics class has methods like 
 getReadOps(), getLargeReadOps() and getWriteOps() that return int. These int 
 values can overflow if the exceed 2^31-1 showing negative values. It would be 
 nice if these can be changed to long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2486) FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps

2014-09-01 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117800#comment-14117800
 ] 

Sandy Ryza commented on YARN-2486:
--

Unfortunately these methods were made public in 2.5, so we can't change their 
signatures.  We can, however, add versions with new names that return longs.

 FileSystem counters can overflow for large number of readOps, largeReadOps, 
 writeOps
 

 Key: YARN-2486
 URL: https://issues.apache.org/jira/browse/YARN-2486
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0, 2.4.1
Reporter: Swapnil Daingade
Priority: Minor

 The org.apache.hadoop.fs.FileSystem.Statistics.StatisticsData class defines 
 readOps, largeReadOps, writeOps as int. Also the The 
 org.apache.hadoop.fs.FileSystem.Statistics class has methods like 
 getReadOps(), getLargeReadOps() and getWriteOps() that return int. These int 
 values can overflow if the exceed 2^31-1 showing negative values. It would be 
 nice if these can be changed to long.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2487) Need to support timeout of AM When no containers are assigned to it for a defined period

2014-09-01 Thread Naganarasimha G R (JIRA)

Naganarasimha G R created YARN-2487:
---

 Summary: Need to support timeout of AM When no containers are 
assigned to it for a defined period
 Key: YARN-2487
 URL: https://issues.apache.org/jira/browse/YARN-2487
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R


 There are some scenarios where AM will not get containers and indefinetely 
waiting. We faced one such sceanrio which makes the applications to get hung : 
Consider a cluster setup which has 2 NMS of each 8GB resource,
And 2 applications are launched in the default queue where in each AM is taking 
2 GB each.
Each AM is placed in each of the NM. Now each AM is requesting for container of 
7Gb  mem resource .
As in each NM only 6GB resource is available both the applications are hung 
forever.

To avoid such scenarios i would to propose 
generic timeout feature for all AM's @ the yarn side such that if no containers 
are assigned for an application for a defined period than yarn can timeout the 
application attempt.
Default can be set to 0 where in RM will not timeout the app attempt and user 
can set his own timeout when he submits the application



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2487) Need to support timeout of AM When no containers are assigned to it for a defined period

2014-09-01 Thread Naganarasimha G R (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Naganarasimha G R updated YARN-2487:

Description:
There are some scenarios where AM will not get containers and indefinitely
waiting. We faced one such sceanrio which makes the applications to get hung :
Consider a cluster setup which has 2 NMS of each 8GB resource,
And 2 applications(MR2) are launched in the default queue where in each AM is
taking 2 GB each.
Each AM is placed in each of the NM. Now each AM is requesting for container of
7Gb mem resource .
As in each NM only 6GB resource is available both the applications are hung
forever.

To avoid such scenarios i would like to propose
generic timeout feature for all AM's in yarn, such that if no containers are
assigned for an application for a defined period than yarn can timeout the
application attempt.
Default can be set to 0 where in RM will not timeout the app attempt and user
can set his own timeout when he submits the application

was:
There are some scenarios where AM will not get containers and indefinetely
waiting. We faced one such sceanrio which makes the applications to get hung :
Consider a cluster setup which has 2 NMS of each 8GB resource,
And 2 applications are launched in the default queue where in each AM is taking
2 GB each.
Each AM is placed in each of the NM. Now each AM is requesting for container of
7Gb mem resource .
As in each NM only 6GB resource is available both the applications are hung
forever.

To avoid such scenarios i would to propose
generic timeout feature for all AM's @ the yarn side such that if no containers
are assigned for an application for a defined period than yarn can timeout the
application attempt.
Default can be set to 0 where in RM will not timeout the app attempt and user
can set his own timeout when he submits the application

Need to support timeout of AM When no containers are assigned to it for a
defined period

Key: YARN-2487
URL: https://issues.apache.org/jira/browse/YARN-2487
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R

There are some scenarios where AM will not get containers and indefinitely
waiting. We faced one such sceanrio which makes the applications to get hung
:
Consider a cluster setup which has 2 NMS of each 8GB resource,
And 2 applications(MR2) are launched in the default queue where in each AM is
taking 2 GB each.
Each AM is placed in each of the NM. Now each AM is requesting for container
of 7Gb mem resource .
As in each NM only 6GB resource is available both the applications are hung
forever.
To avoid such scenarios i would like to propose
generic timeout feature for all AM's in yarn, such that if no containers are
assigned for an application for a defined period than yarn can timeout the
application attempt.
Default can be set to 0 where in RM will not timeout the app attempt and user
can set his own timeout when he submits the application

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM

[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM

[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

[jira] [Created] (YARN-2485) Fix WSCE folder/file/classpathJar permission/order when running as non-admin

[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store

[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register

[jira] [Created] (YARN-2486) FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps

[jira] [Commented] (YARN-2486) FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps

[jira] [Commented] (YARN-2486) FileSystem counters can overflow for large number of readOps, largeReadOps, writeOps

[jira] [Created] (YARN-2487) Need to support timeout of AM When no containers are assigned to it for a defined period

[jira] [Updated] (YARN-2487) Need to support timeout of AM When no containers are assigned to it for a defined period

12 matches

Site Navigation

Mail list logo

Footer information