from:"Gera Shegalov \(JIRA\)"

[jira] [Created] (YARN-11056) Incorrect capitalization of NVIDIA in the docs

2021-12-30 Thread Gera Shegalov (Jira)

Gera Shegalov created YARN-11056:


 Summary: Incorrect capitalization of NVIDIA in the docs 
 Key: YARN-11056
 URL: https://issues.apache.org/jira/browse/YARN-11056
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Gera Shegalov


According to [https://www.nvidia.com/en-us/about-nvidia/legal-info/]  the 
spelling should be all-caps NVIDIA

Examples of differing capitalization 
https://github.com/apache/hadoop/blob/03cfc852791c14fad39db4e5b14104a276c08e59/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/UsingGpus.md

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-11055) cgroups-operations.c some fprintf format strings lack "\n"

2021-12-30 Thread Gera Shegalov (Jira)

Gera Shegalov created YARN-11055:


 Summary: cgroups-operations.c some fprintf format strings lack 
"\n" 
 Key: YARN-11055
 URL: https://issues.apache.org/jira/browse/YARN-11055
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.3.1, 3.3.0, 3.2.0, 3.1.0, 3.0.0
Reporter: Gera Shegalov


In cgroup-operations.c some {{{}fprintf{}}}s are missing a newline char at the 
end leading to a hard-to-parse error message output 

example: 
https://github.com/apache/hadoop/blame/b225287913ac366a531eacfa0266adbdf03d883e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.c#L130
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7847) Provide permalinks for container logs

2018-01-29 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-7847:
---

 Summary: Provide permalinks for container logs
 Key: YARN-7847
 URL: https://issues.apache.org/jira/browse/YARN-7847
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: amrmproxy
Reporter: Gera Shegalov


YARN doesn't offer a service similar to AM proxy URL for container logs even if 
log-aggregation is enabled. The current mechanism of having the NM redirect to 
yarn.log.server.url fails once the node is down. Workarounds like in MR 
JobHistory to rewrite URI's on the fly are possible, but do not represent a 
good long term solution to onboard new apps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7747) YARN UI is broken in the minicluster

2018-01-13 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-7747:
---

 Summary: YARN UI is broken in the minicluster 
 Key: YARN-7747
 URL: https://issues.apache.org/jira/browse/YARN-7747
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov


YARN web apps use non-injected instances of GuiceFilter, i.e. instances created 
by Jetty as opposed by Guice itself. This triggers the [call 
path|https://github.com/google/guice/blob/master/extensions/servlet/src/com/google/inject/servlet/GuiceFilter.java#L251]
 where the static field {{pipeline}} is used instead of the instance field 
{{injectedPipeline}}. However, besides GuiceFilter instances created by Jetty, 
each Guice module generates them as well. On the injection call path this 
static variable is updated by each instance. Thus if there are multiple modules 
as it happens to be the case in the minicluster the one loaded last ends up 
defining the filter pipeline for all Jetty instances. In the minicluster case 
this is the nodemanager UI
 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

2017-12-01 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-7592:
---

 Summary: yarn.federation.failover.enabled missing in 
yarn-default.xml
 Key: YARN-7592
 URL: https://issues.apache.org/jira/browse/YARN-7592
 Project: Hadoop YARN
  Issue Type: Bug
  Components: federation
Affects Versions: 3.0.0-beta1
Reporter: Gera Shegalov


yarn.federation.failover.enabled should be documented in yarn-default.xml. I am 
also not sure why it should be true by default and force the HA retry policy in 
{{RMProxy#createRMProxy}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-4789) Provide helpful exception for non-PATH-like conflict with admin.user.env

2016-03-11 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-4789:
---

 Summary: Provide helpful exception for non-PATH-like conflict with 
admin.user.env
 Key: YARN-4789
 URL: https://issues.apache.org/jira/browse/YARN-4789
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.2
Reporter: Gera Shegalov
Assignee: Gera Shegalov


Environment variables specified in mapreduce.admin.user.env are supposed to be 
paths (class, shell, library) and they can be merged with the user-provided 
values. However, it's also possible that the cluster admins specify some 
non-PATH-like variable such as JAVA_HOME. In this case if there is the same 
variable provided by the user, we'll get a concatenation that does not make 
sense and is difficult to debug.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-683) Class MiniYARNCluster not found when starting the minicluster

2015-08-15 Thread Gera Shegalov (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov resolved YARN-683.

Resolution: Duplicate

Closing as a dup because HADOOP-9891 now documents this workaround

 Class MiniYARNCluster not found when starting the minicluster
 -

 Key: YARN-683
 URL: https://issues.apache.org/jira/browse/YARN-683
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.0.4-alpha
 Environment: MacOSX 10.8.3 - Java 1.6.0_45
Reporter: Rémy SAISSY

 Starting the minicluster with the following command line:
 bin/hadoop jar 
 share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.4-alpha-tests.jar
  minicluster -format
 Fails for MiniYARNCluster with the following error:
 13/05/14 17:06:58 INFO hdfs.MiniDFSCluster: Cluster is active
 13/05/14 17:06:58 INFO mapreduce.MiniHadoopClusterManager: Started 
 MiniDFSCluster -- namenode on port 55205
 java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/server/MiniYARNCluster
   at 
 org.apache.hadoop.mapreduce.MiniHadoopClusterManager.start(MiniHadoopClusterManager.java:170)
   at 
 org.apache.hadoop.mapreduce.MiniHadoopClusterManager.run(MiniHadoopClusterManager.java:129)
   at 
 org.apache.hadoop.mapreduce.MiniHadoopClusterManager.main(MiniHadoopClusterManager.java:314)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
   at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
   at 
 org.apache.hadoop.test.MapredTestDriver.run(MapredTestDriver.java:115)
   at 
 org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:123)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.yarn.server.MiniYARNCluster
   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
   ... 16 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3568) TestAMRMTokens should use some random port

2015-05-01 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-3568:
---

 Summary: TestAMRMTokens should use some random port
 Key: YARN-3568
 URL: https://issues.apache.org/jira/browse/YARN-3568
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Gera Shegalov


Since the default port is used for yarn.resourcemanager.scheduler.address, if 
we already run a pseudo-distributed cluster on the same development machine, 
the test fails like this:
{code}
testMasterKeyRollOver[0](org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens)
  Time elapsed: 1.511 sec   ERROR!
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: 
Problem binding to [0.0.0.0:8030] java.net.BindException: Address already in 
use; For more details see:  http://wiki.apache.org/hadoop/BindException
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.apache.hadoop.ipc.Server.bind(Server.java:413)
at org.apache.hadoop.ipc.Server$Listener.init(Server.java:590)
at org.apache.hadoop.ipc.Server.init(Server.java:2340)
at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:945)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.init(ProtobufRpcEngine.java:534)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:509)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:787)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
at 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.serviceStart(ApplicationMasterService.java:140)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:586)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:996)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1037)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1033)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1033)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1073)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens.testMasterKeyRollOver(TestAMRMTokens.java:235)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

2014-11-21 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-2893:
---

 Summary: AMLaucher: sporadic job failures due to EOFException in 
readTokenStorageStream
 Key: YARN-2893
 URL: https://issues.apache.org/jira/browse/YARN-2893
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov


MapReduce jobs on our clusters experience sporadic failures due to corrupt 
tokens in the AM launch context.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2377) Localization exception stack traces are not passed as diagnostic info

2014-07-31 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-2377:
---

 Summary: Localization exception stack traces are not passed as 
diagnostic info
 Key: YARN-2377
 URL: https://issues.apache.org/jira/browse/YARN-2377
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov


In the Localizer log one can only see this kind of message
{code}
14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { 
hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar,
 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos tException: 
ha-nn-uri-0
{code}

And then onlt {{ java.net.UnknownHos tException: ha-nn-uri-0}} message is 
propagated as diagnostics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1996) Provide alternative policies for UNHEALTHY nodes.

2014-04-29 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-1996:
---

 Summary: Provide alternative policies for UNHEALTHY nodes.
 Key: YARN-1996
 URL: https://issues.apache.org/jira/browse/YARN-1996
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, scheduler
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov


Currently, UNHEALTHY nodes can significantly prolong execution of large 
expensive jobs as demonstrated by MAPREDUCE-5817, and downgrade the cluster 
health even further due to [positive 
feedback|http://en.wikipedia.org/wiki/Positive_feedback]. A container set that 
might have deemed the node unhealthy in the first place starts spreading across 
the cluster because the current node is declared unusable and all its 
containers are killed and rescheduled on different nodes.

To mitigate this, we experiment with a patch that allows containers already 
running on a node turning UNHEALTHY to complete (drain) whereas no new 
container can be assigned to it until it turns healthy again.

This mechanism can also be used for graceful decommissioning of NM. To this 
end, we have to write a health script  such that it can deterministically 
report UNHEALTHY. For example with 
{code}
if [ -e $1 ] ; then 
   
  echo ERROR Node decommmissioning via health script hack   
   
fi 
{code}

In the current version patch, the behavior is controlled by a boolean property 
{{yarn.nodemanager.unheathy.drain.containers}}. More versatile policies are 
possible in the future work. Currently, the health state of a node is binary 
determined based on the disk checker and the health script ERROR outputs. 
However, we can as well interpret health script output similar to java logging 
levels (one of which is ERROR) such as WARN, FATAL. Each level can then be 
treated differently. E.g.,
- FATAL:  unusable like today 
- ERROR: drain
- WARN: halve the node capacity.
complimented with some equivalence rules such as 3 WARN messages == ERROR,  
2*ERROR == FATAL, etc. 









--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1700) AHS records non-launched containers

2014-02-09 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-1700:
---

 Summary: AHS records non-launched containers
 Key: YARN-1700
 URL: https://issues.apache.org/jira/browse/YARN-1700
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov


When testing AHS with a MR sleep job, AHS sometimes threw NPE out  of 
AppAttemptBlock.render because logUrl in container report was null. I realized 
that this is because AHS may record containers that never launch.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1701) More intuitive defaults for AHS

2014-02-09 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-1701:
---

 Summary: More intuitive defaults for AHS
 Key: YARN-1701
 URL: https://issues.apache.org/jira/browse/YARN-1701
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Gera Shegalov
Assignee: Gera Shegalov


When I enable AHS via yarn.ahs.enabled, the app history is still not visible in 
AHS webUI. This is due to NullApplicationHistoryStore as 
yarn.resourcemanager.history-writer.class. It would be good to have just one 
key to enable basic functionality.

yarn.ahs.fs-history-store.uri uses ${hadoop.log.dir}, which is local file 
system location. However, FileSystemApplicationHistoryStore uses DFS by 
default.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1599) webUI rm.webapp.AppBlock should redirect to a history App page if and when available

2014-01-13 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-1599:
---

 Summary: webUI rm.webapp.AppBlock should redirect to a history App 
page if and when available
 Key: YARN-1599
 URL: https://issues.apache.org/jira/browse/YARN-1599
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0, 2.0.5-alpha
Reporter: Gera Shegalov
Assignee: Gera Shegalov


When the log aggregation is enabled, and the application finishes, our users 
think that the AppMaster logs were lost because the link to the AM attempt logs 
are not updated and result in HTTP 404. Only tracking URL is updated. In order 
to have a smoother user experience, we propose to simply redirect to the new 
tracking URL when the page with invalid log links is accessed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1551) Allow user-specified reason for killApplication

2013-12-30 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-1551:
---

 Summary: Allow user-specified reason for killApplication
 Key: YARN-1551
 URL: https://issues.apache.org/jira/browse/YARN-1551
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Gera Shegalov


This completes MAPREDUCE-5648



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1542) Add unit test for public resource on viewfs

2013-12-26 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-1542:
---

 Summary: Add unit test for public resource on viewfs
 Key: YARN-1542
 URL: https://issues.apache.org/jira/browse/YARN-1542
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Gera Shegalov






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1529) Add Localization overhead metrics to NM

2013-12-23 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-1529:
---

 Summary: Add Localization overhead metrics to NM
 Key: YARN-1529
 URL: https://issues.apache.org/jira/browse/YARN-1529
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov


Users are often unaware of localization cost that their jobs incur. To measure 
effectiveness of localization caches it is necessary to expose the overhead in 
the form of metrics.

We propose addition of the following metrics to NodeManagerMetrics.

When a container is about to launch, its set of LocalResources has to be 
fetched from a central location, typically on HDFS, that results in a number of 
download requests for the files missing in caches.

LocalizedFilesMissed: total files (requests) downloaded from DFS. Cache misses.

LocalizedFilesCached: total localization requests that were served from local 
caches. Cache hits.

LocalizedBytesMissed: total bytes downloaded from DFS due to cache misses.

LocalizedBytesCached: total bytes satisfied from local caches.

Localized(Files|Bytes)CachedRatio: percentage of localized (files|bytes) that 
were served out of cache: ratio = 100 * caches / (caches + misses)

LocalizationDownloadNanos: total elapsed time in nanoseconds for a container to 
go from ResourceRequestTransition to LocalizedTransition





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC

2013-12-17 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-1515:
---

 Summary: Ability to dump the container threads and stop the 
containers in a single RPC
 Key: YARN-1515
 URL: https://issues.apache.org/jira/browse/YARN-1515
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov


This is needed to implement MAPREDUCE-5044 to enable thread diagnostics for 
timed-out task attempts.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (YARN-1401) With zero sleep-delay-before-sigkill.ms, no signal is ever sent

2013-11-11 Thread Gera Shegalov (JIRA)

Gera Shegalov created YARN-1401:
---

 Summary: With zero sleep-delay-before-sigkill.ms, no signal is 
ever sent
 Key: YARN-1401
 URL: https://issues.apache.org/jira/browse/YARN-1401
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.2.0
Reporter: Gera Shegalov


If you set in yarn-site.xml yarn.nodemanager.sleep-delay-before-sigkill.ms=0 
then an unresponsive child JVM is never killed. In MRv1, TT used to immediately 
SIGKILL in this case. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-11056) Incorrect capitalization of NVIDIA in the docs

[jira] [Created] (YARN-11055) cgroups-operations.c some fprintf format strings lack "\n"

[jira] [Created] (YARN-7847) Provide permalinks for container logs

[jira] [Created] (YARN-7747) YARN UI is broken in the minicluster

[jira] [Created] (YARN-7592) yarn.federation.failover.enabled missing in yarn-default.xml

[jira] [Created] (YARN-4789) Provide helpful exception for non-PATH-like conflict with admin.user.env

[jira] [Resolved] (YARN-683) Class MiniYARNCluster not found when starting the minicluster

[jira] [Created] (YARN-3568) TestAMRMTokens should use some random port

[jira] [Created] (YARN-2893) AMLaucher: sporadic job failures due to EOFException in readTokenStorageStream

[jira] [Created] (YARN-2377) Localization exception stack traces are not passed as diagnostic info

[jira] [Created] (YARN-1996) Provide alternative policies for UNHEALTHY nodes.

[jira] [Created] (YARN-1700) AHS records non-launched containers

[jira] [Created] (YARN-1701) More intuitive defaults for AHS

[jira] [Created] (YARN-1599) webUI rm.webapp.AppBlock should redirect to a history App page if and when available

[jira] [Created] (YARN-1551) Allow user-specified reason for killApplication

[jira] [Created] (YARN-1542) Add unit test for public resource on viewfs

[jira] [Created] (YARN-1529) Add Localization overhead metrics to NM

[jira] [Created] (YARN-1515) Ability to dump the container threads and stop the containers in a single RPC

[jira] [Created] (YARN-1401) With zero sleep-delay-before-sigkill.ms, no signal is ever sent

19 matches

Site Navigation

Mail list logo

Footer information