[jira] [Created] (YARN-10851) Tez session close does not interrupt yarn's async thread

2021-07-07 Thread Qihong Wu (Jira)
Qihong Wu created YARN-10851:


 Summary: Tez session close does not interrupt yarn's async thread
 Key: YARN-10851
 URL: https://issues.apache.org/jira/browse/YARN-10851
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.10.1, 2.8.5
 Environment: On an HA cluster, where RM1 is not the active RM
Yarn of version 2.8.5 and is configured with Tez
Reporter: Qihong Wu
 Attachments: hive.log

Hi, I want to ask for the expertise knowledge on the yarn behavior when 
handling `InterruptedIOException`. 

The issue occurs on a HA cluster, where RM1 is NOT the active RM. Therefore, if 
the yarn request made to RM1 failed, the RM failover should happen. However, if 
an interrupted exception is thrown when connecting to RM1, the thread should 
try to [bail 
out|https://dzone.com/articles/how-to-handle-the-interruptedexception] as soon 
as possible to [respect interrupt 
request|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html#shutdownNow--],
 rather than moving on to another RM.

But I found my application (hive) after throwing `InterruptedIOException` when 
trying to connect with RM1 failed, continuing to RM2. I want to know how does 
yarn handle InterruptedIOException, shouldn't the async thread gets interrupted 
and shutdown when tez close() triggered interrupt request?



*The reproduction step is:*
 1. In an HA cluster which uses yarn of version 2.8.5 and is configured with Tez
 2. Make sure RM1 is not the active RM by checking `yarn rmadmin 
-getAllServiceState`. It it is, manually [transition RM2 as active 
RM|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html#Admin_commands].
 3. Apply failover-retry properties to yarn-site.xml 
{quote}
 yarn.client.failover-retries
 4
 
 
 yarn.client.failover-retries-on-socket-timeouts
 4
 
 
 yarn.client.failover-max-attempts
 4
 
{quote}
4. Run a simple application to yarn-client (for example, a simple hive DDL 
command)
{quote}hive --hiveconf hive.root.logger=TRACE,console -e "create table tez_test 
(id int, name string);"
{quote}
5. Find from application's log (for example, hive.log), you can find 
`RetryInvocationHandler` has captured the `InterruptedIOException` when request 
was talking over rm1, but the thread didn't bail out immediately, but continue 
moving to rm2.



*More information:*
The interrupted exception is triggered via via 
[TezSessionState#close|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java#L689]
 and 
[Future#cancel|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Future.html#cancel-boolean-].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10849) Clarify testcase documentation for TestServiceAM#testContainersReleasedWhenPreLaunchFails

2021-07-07 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17376630#comment-17376630
 ] 

Peter Bacsko commented on YARN-10849:
-

[~snemeth] patch v2 seems to undo what v1 introduced. 

> Clarify testcase documentation for 
> TestServiceAM#testContainersReleasedWhenPreLaunchFails
> -
>
> Key: YARN-10849
> URL: https://issues.apache.org/jira/browse/YARN-10849
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-10849.001.patch, YARN-10849.002.patch
>
>
> There's a small comment added to testcase: 
> org.apache.hadoop.yarn.service.TestServiceAM#testContainersReleasedWhenPreLaunchFails:
> {code}
>   // Test to verify that the containers are released and the
>   // component instance is added to the pending queue when building the launch
>   // context fails.
> {code}
> However, it was not clear for me why the "launch context" would fail.
> While the test passes, it throws an Exception that tells the story. 
> {code}
> 2021-07-06 18:31:04,438 ERROR [pool-275-thread-1] 
> containerlaunch.ContainerLaunchService (ContainerLaunchService.java:run(122)) 
> - [COMPINSTANCE compa-0 : container_1625589063422_0001_01_01]: Failed to 
> launch container.
> java.lang.IllegalArgumentException: Can not create a Path from a null string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:164)
>   at org.apache.hadoop.fs.Path.(Path.java:180)
>   at 
> org.apache.hadoop.yarn.service.provider.tarball.TarballProviderService.processArtifact(TarballProviderService.java:39)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:144)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:107)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This exception is thrown because the id of the Artifact object is unset 
> (null) and TarballProviderService.processArtifact verifies it and it does not 
> allow such artifacts.
> The aim of this jira is to add a clarification comment or javadoc to this 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10850) TimelineService v2 lists containers for all attempts when filtering for one

2021-07-07 Thread Benjamin Teke (Jira)
Benjamin Teke created YARN-10850:


 Summary: TimelineService v2 lists containers for all attempts when 
filtering for one
 Key: YARN-10850
 URL: https://issues.apache.org/jira/browse/YARN-10850
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelinereader
Reporter: Benjamin Teke


When using the command

{code:java}
yarn container -list 
{code}

with an application attempt ID based on the help only the containers for that 
attempt should be listed.

{code:java}
-list List containers for application
  attempt when application
  attempt ID is provided. When
  application name is provided,
  then it finds the instances of
  the application based on app's
  own implementation, and
  -appTypes option must be
  specified unless it is the
  default yarn-service type. With
  app name, it supports optional
  use of -version to filter
  instances based on app version,
  -components to filter instances
  based on component names,
  -states to filter instances
  based on instance state.
{code}

When TimelineService v2 is enabled all of the containers for the application 
are returned. 

{code:java}
hrt_qa@ctr-e172-1620330694487-146061-01-02:/hwqe/hadoopqe$ yarn 
applicationattempt -list application_1625124233002_0007
21/07/01 09:32:23 INFO impl.TimelineReaderClientImpl: Initialized 
TimelineReader 
URI=http://ctr-e172-1620330694487-146061-01-04.hwx.site:8198/ws/v2/timeline/,
 clusterId=yarn-cluster
21/07/01 09:32:24 INFO client.AHSProxy: Connecting to Application History 
server at ctr-e172-1620330694487-146061-01-04.hwx.site/172.27.113.4:10200
21/07/01 09:32:24 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
Total number of application attempts :2
 ApplicationAttempt-Id State
AM-Container-IdTracking-URL
appattempt_1625124233002_0007_01  FAILED
container_e43_1625124233002_0007_01_01  
http://ctr-e172-1620330694487-146061-01-03.hwx.site:8088/proxy/application_1625124233002_0007/
appattempt_1625124233002_0007_02  KILLED
container_e43_1625124233002_0007_02_01  
http://ctr-e172-1620330694487-146061-01-03.hwx.site:8088/proxy/application_1625124233002_0007/
{code}

Querying the 2 app attempts produces the same output:

{code:java}
hrt_qa@ctr-e172-1620330694487-146061-01-02:/hwqe/hadoopqe$ yarn container 
-list appattempt_1625124233002_0007_01
21/07/01 09:32:35 INFO impl.TimelineReaderClientImpl: Initialized 
TimelineReader 
URI=http://ctr-e172-1620330694487-146061-01-04.hwx.site:8198/ws/v2/timeline/,
 clusterId=yarn-cluster
21/07/01 09:32:35 INFO client.AHSProxy: Connecting to Application History 
server at ctr-e172-1620330694487-146061-01-04.hwx.site/172.27.113.4:10200
21/07/01 09:32:35 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm2
21/07/01 09:32:36 INFO conf.Configuration: found resource resource-types.xml at 
file:/etc/hadoop/7.1.7.0-504/0/resource-types.xml
Total number of containers :12
  Container-IdStart Time Finish Time
   StateHost   Node Http Address
LOG-URL
container_e43_1625124233002_0007_02_04   N/A
 N/ACOMPLETE
ctr-e172-1620330694487-146061-01-02.hwx.site:25454  
ctr-e172-1620330694487-146061-01-02.hwx.site:8042   
http://ctr-e172-1620330694487-146061-01-04.hwx.site:19888/jobhistory/logs/logs/ctr-e172-1620330694487-146061-01-02.hwx.site:25454/container_e43_1625124233002_0007_02_04/container_e43_1625124233002_0007_02_04/hrt_qa
container_e43_1625124233002_0007_02_05   N/A
 N/ACOMPLETE
ctr-e172-1620330694487-146061-01-07.hwx.site:25454  
ctr-e172-1620330694487-146061-01-07.hwx.site:8042   
http://ctr-e172-1620330694487-146061-01-04.hwx.site:19888/jobhistory/logs/logs/ctr-e172-1620330694487-146061-01-07.hwx.site:25454/container_e43_1625124233002_0007_02_05/container_e43_1625124233002_0007_02_05/hrt_qa
container_e43_1625124233002_0007_02_03  

[jira] [Updated] (YARN-10849) Clarify testcase documentation for TestServiceAM#testContainersReleasedWhenPreLaunchFails

2021-07-07 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10849:
--
Attachment: YARN-10849.002.patch

> Clarify testcase documentation for 
> TestServiceAM#testContainersReleasedWhenPreLaunchFails
> -
>
> Key: YARN-10849
> URL: https://issues.apache.org/jira/browse/YARN-10849
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-10849.001.patch, YARN-10849.002.patch
>
>
> There's a small comment added to testcase: 
> org.apache.hadoop.yarn.service.TestServiceAM#testContainersReleasedWhenPreLaunchFails:
> {code}
>   // Test to verify that the containers are released and the
>   // component instance is added to the pending queue when building the launch
>   // context fails.
> {code}
> However, it was not clear for me why the "launch context" would fail.
> While the test passes, it throws an Exception that tells the story. 
> {code}
> 2021-07-06 18:31:04,438 ERROR [pool-275-thread-1] 
> containerlaunch.ContainerLaunchService (ContainerLaunchService.java:run(122)) 
> - [COMPINSTANCE compa-0 : container_1625589063422_0001_01_01]: Failed to 
> launch container.
> java.lang.IllegalArgumentException: Can not create a Path from a null string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:164)
>   at org.apache.hadoop.fs.Path.(Path.java:180)
>   at 
> org.apache.hadoop.yarn.service.provider.tarball.TarballProviderService.processArtifact(TarballProviderService.java:39)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:144)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:107)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This exception is thrown because the id of the Artifact object is unset 
> (null) and TarballProviderService.processArtifact verifies it and it does not 
> allow such artifacts.
> The aim of this jira is to add a clarification comment or javadoc to this 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10849) Clarify testcase documentation for TestServiceAM#testContainersReleasedWhenPreLaunchFails

2021-07-07 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10849:
--
Attachment: YARN-10849.001.patch

> Clarify testcase documentation for 
> TestServiceAM#testContainersReleasedWhenPreLaunchFails
> -
>
> Key: YARN-10849
> URL: https://issues.apache.org/jira/browse/YARN-10849
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-10849.001.patch
>
>
> There's a small comment added to testcase: 
> org.apache.hadoop.yarn.service.TestServiceAM#testContainersReleasedWhenPreLaunchFails:
> {code}
>   // Test to verify that the containers are released and the
>   // component instance is added to the pending queue when building the launch
>   // context fails.
> {code}
> However, it was not clear for me why the "launch context" would fail.
> While the test passes, it throws an Exception that tells the story. 
> {code}
> 2021-07-06 18:31:04,438 ERROR [pool-275-thread-1] 
> containerlaunch.ContainerLaunchService (ContainerLaunchService.java:run(122)) 
> - [COMPINSTANCE compa-0 : container_1625589063422_0001_01_01]: Failed to 
> launch container.
> java.lang.IllegalArgumentException: Can not create a Path from a null string
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:164)
>   at org.apache.hadoop.fs.Path.(Path.java:180)
>   at 
> org.apache.hadoop.yarn.service.provider.tarball.TarballProviderService.processArtifact(TarballProviderService.java:39)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:144)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:107)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This exception is thrown because the id of the Artifact object is unset 
> (null) and TarballProviderService.processArtifact verifies it and it does not 
> allow such artifacts.
> The aim of this jira is to add a clarification comment or javadoc to this 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10849) Clarify testcase documentation for TestServiceAM#testContainersReleasedWhenPreLaunchFails

2021-07-07 Thread Szilard Nemeth (Jira)
Szilard Nemeth created YARN-10849:
-

 Summary: Clarify testcase documentation for 
TestServiceAM#testContainersReleasedWhenPreLaunchFails
 Key: YARN-10849
 URL: https://issues.apache.org/jira/browse/YARN-10849
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


There's a small comment added to testcase: 
org.apache.hadoop.yarn.service.TestServiceAM#testContainersReleasedWhenPreLaunchFails:
{code}
  // Test to verify that the containers are released and the
  // component instance is added to the pending queue when building the launch
  // context fails.
{code}

However, it was not clear for me why the "launch context" would fail.
While the test passes, it throws an Exception that tells the story. 

{code}
2021-07-06 18:31:04,438 ERROR [pool-275-thread-1] 
containerlaunch.ContainerLaunchService (ContainerLaunchService.java:run(122)) - 
[COMPINSTANCE compa-0 : container_1625589063422_0001_01_01]: Failed to 
launch container.
java.lang.IllegalArgumentException: Can not create a Path from a null string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:164)
at org.apache.hadoop.fs.Path.(Path.java:180)
at 
org.apache.hadoop.yarn.service.provider.tarball.TarballProviderService.processArtifact(TarballProviderService.java:39)
at 
org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:144)
at 
org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:107)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

This exception is thrown because the id of the Artifact object is unset (null) 
and TarballProviderService.processArtifact verifies it and it does not allow 
such artifacts.
The aim of this jira is to add a clarification comment or javadoc to this 
method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10789) RM HA startup can fail due to race conditions in ZKConfigurationStore

2021-07-07 Thread Tarun Parimi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarun Parimi updated YARN-10789:

Attachment: (was: YARN-10789.branch-3.2.001.patch)

> RM HA startup can fail due to race conditions in ZKConfigurationStore
> -
>
> Key: YARN-10789
> URL: https://issues.apache.org/jira/browse/YARN-10789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10789.001.patch, YARN-10789.002.patch, 
> YARN-10789.branch-3.2.001.patch, YARN-10789.branch-3.3.001.patch
>
>
> We are observing below error randomly during hadoop install and RM initial 
> startup when HA is enabled and yarn.scheduler.configuration.store.class=zk is 
> configured. This causes one of the RMs to not startup.
> {code:java}
> 2021-05-26 12:59:18,986 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state INITED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /confstore/CONF_STORE
> {code}
> We are trying to create the znode /confstore/CONF_STORE when we initialize 
> the ZKConfigurationStore. But the problem is that the ZKConfigurationStore is 
> initialized when CapacityScheduler does a serviceInit. This serviceInit is 
> done by both Active and Standby RM. So we can run into a race condition when 
> both Active and Standby try to create the same znode when both RM are started 
> at same time.
> ZKRMStateStore on the other hand avoids such race conditions, by creating the 
> znodes only after serviceStart. serviceStart only happens for the active RM 
> which won the leader election, unlike serviceInit which happens irrespective 
> of leader election.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10789) RM HA startup can fail due to race conditions in ZKConfigurationStore

2021-07-07 Thread Tarun Parimi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarun Parimi updated YARN-10789:

Attachment: YARN-10789.branch-3.2.001.patch

> RM HA startup can fail due to race conditions in ZKConfigurationStore
> -
>
> Key: YARN-10789
> URL: https://issues.apache.org/jira/browse/YARN-10789
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10789.001.patch, YARN-10789.002.patch, 
> YARN-10789.branch-3.2.001.patch, YARN-10789.branch-3.3.001.patch
>
>
> We are observing below error randomly during hadoop install and RM initial 
> startup when HA is enabled and yarn.scheduler.configuration.store.class=zk is 
> configured. This causes one of the RMs to not startup.
> {code:java}
> 2021-05-26 12:59:18,986 INFO org.apache.hadoop.service.AbstractService: 
> Service RMActiveServices failed in state INITED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /confstore/CONF_STORE
> {code}
> We are trying to create the znode /confstore/CONF_STORE when we initialize 
> the ZKConfigurationStore. But the problem is that the ZKConfigurationStore is 
> initialized when CapacityScheduler does a serviceInit. This serviceInit is 
> done by both Active and Standby RM. So we can run into a race condition when 
> both Active and Standby try to create the same znode when both RM are started 
> at same time.
> ZKRMStateStore on the other hand avoids such race conditions, by creating the 
> znodes only after serviceStart. serviceStart only happens for the active RM 
> which won the leader election, unlike serviceInit which happens irrespective 
> of leader election.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10840) yarn app status fails with ArrayIndexOutOfBoundsException

2021-07-07 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10840:
-
Attachment: YARN-10840-001.patch

> yarn app status fails with ArrayIndexOutOfBoundsException 
> --
>
> Key: YARN-10840
> URL: https://issues.apache.org/jira/browse/YARN-10840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Abhinaba Sarkar
>Assignee: Abhinaba Sarkar
>Priority: Major
> Attachments: YARN-10840-001.patch
>
>
> Array index out of bounds exception in the ClientAMService.getStatus() - 
> {code:java}
> 2021-07-04 20:00:24,488 [IPC Server handler 0 on 25347] INFO  ipc.Server - 
> IPC Server handler 0 on 25347, call Call#163 Retry#0 
> org.apache.hadoop.yarn.service.ClientAMProtocol.getStatus from 10.0.0.10:42446
> org.codehaus.jackson.map.JsonMappingException: Index: 11, Size: 11 (through 
> reference chain: 
> org.apache.hadoop.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.hadoop.yarn.service.api.records.Component["containers"]->java.util.ArrayList[11])
>   at 
> org.codehaus.jackson.map.JsonMappingException.wrapWithPath(JsonMappingException.java:218)
>   at 
> org.codehaus.jackson.map.JsonMappingException.wrapWithPath(JsonMappingException.java:197)
>   at 
> org.codehaus.jackson.map.ser.std.SerializerBase.wrapAndThrow(SerializerBase.java:166)
>   at 
> org.codehaus.jackson.map.ser.std.StdContainerSerializers$IndexedListSerializer.serializeContents(StdContainerSerializers.java:127)
>   at 
> org.codehaus.jackson.map.ser.std.StdContainerSerializers$IndexedListSerializer.serializeContents(StdContainerSerializers.java:71)
>   at 
> org.codehaus.jackson.map.ser.std.AsArraySerializerBase.serialize(AsArraySerializerBase.java:86)
>   at 
> org.codehaus.jackson.map.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:446)
>   at 
> org.codehaus.jackson.map.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:150)
>   at 
> org.codehaus.jackson.map.ser.BeanSerializer.serialize(BeanSerializer.java:112)
>   at 
> org.codehaus.jackson.map.ser.std.StdContainerSerializers$IndexedListSerializer.serializeContents(StdContainerSerializers.java:122)
>   at 
> org.codehaus.jackson.map.ser.std.StdContainerSerializers$IndexedListSerializer.serializeContents(StdContainerSerializers.java:71)
>   at 
> org.codehaus.jackson.map.ser.std.AsArraySerializerBase.serialize(AsArraySerializerBase.java:86)
>   at 
> org.codehaus.jackson.map.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:446)
>   at 
> org.codehaus.jackson.map.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:150)
>   at 
> org.codehaus.jackson.map.ser.BeanSerializer.serialize(BeanSerializer.java:112)
>   at 
> org.codehaus.jackson.map.ser.StdSerializerProvider._serializeValue(StdSerializerProvider.java:610)
>   at 
> org.codehaus.jackson.map.ser.StdSerializerProvider.serializeValue(StdSerializerProvider.java:256)
>   at 
> org.codehaus.jackson.map.ObjectMapper._configAndWriteValue(ObjectMapper.java:2575)
>   at 
> org.codehaus.jackson.map.ObjectMapper.writeValueAsString(ObjectMapper.java:2097)
>   at 
> org.apache.hadoop.yarn.service.utils.JsonSerDeser.toJson(JsonSerDeser.java:249)
>   at 
> org.apache.hadoop.yarn.service.ClientAMService.getStatus(ClientAMService.java:125)
>   at 
> org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPBServiceImpl.getStatus(ClientAMProtocolPBServiceImpl.java:59)
>   at 
> org.apache.hadoop.yarn.proto.ClientAMProtocol$ClientAMProtocolService$2.callBlockingMethod(ClientAMProtocol.java:6159)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 11, Size: 11
>   at java.util.ArrayList.rangeCheck(ArrayList.java:659)
>   at java.util.ArrayList.get(ArrayList.java:435)
>   at 
> org.codehaus.jackson.map.ser.std.StdContainerSerializers$IndexedListSerializer.serializeContents(StdContainerSerializers.java:106)
>   ... 27 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (YARN-10840) yarn app status fails with ArrayIndexOutOfBoundsException

2021-07-07 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10840:
-
Attachment: (was: YARN-10840-001.patch)

> yarn app status fails with ArrayIndexOutOfBoundsException 
> --
>
> Key: YARN-10840
> URL: https://issues.apache.org/jira/browse/YARN-10840
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Abhinaba Sarkar
>Assignee: Abhinaba Sarkar
>Priority: Major
> Attachments: YARN-10840-001.patch
>
>
> Array index out of bounds exception in the ClientAMService.getStatus() - 
> {code:java}
> 2021-07-04 20:00:24,488 [IPC Server handler 0 on 25347] INFO  ipc.Server - 
> IPC Server handler 0 on 25347, call Call#163 Retry#0 
> org.apache.hadoop.yarn.service.ClientAMProtocol.getStatus from 10.0.0.10:42446
> org.codehaus.jackson.map.JsonMappingException: Index: 11, Size: 11 (through 
> reference chain: 
> org.apache.hadoop.yarn.service.api.records.Service["components"]->java.util.ArrayList[0]->org.apache.hadoop.yarn.service.api.records.Component["containers"]->java.util.ArrayList[11])
>   at 
> org.codehaus.jackson.map.JsonMappingException.wrapWithPath(JsonMappingException.java:218)
>   at 
> org.codehaus.jackson.map.JsonMappingException.wrapWithPath(JsonMappingException.java:197)
>   at 
> org.codehaus.jackson.map.ser.std.SerializerBase.wrapAndThrow(SerializerBase.java:166)
>   at 
> org.codehaus.jackson.map.ser.std.StdContainerSerializers$IndexedListSerializer.serializeContents(StdContainerSerializers.java:127)
>   at 
> org.codehaus.jackson.map.ser.std.StdContainerSerializers$IndexedListSerializer.serializeContents(StdContainerSerializers.java:71)
>   at 
> org.codehaus.jackson.map.ser.std.AsArraySerializerBase.serialize(AsArraySerializerBase.java:86)
>   at 
> org.codehaus.jackson.map.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:446)
>   at 
> org.codehaus.jackson.map.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:150)
>   at 
> org.codehaus.jackson.map.ser.BeanSerializer.serialize(BeanSerializer.java:112)
>   at 
> org.codehaus.jackson.map.ser.std.StdContainerSerializers$IndexedListSerializer.serializeContents(StdContainerSerializers.java:122)
>   at 
> org.codehaus.jackson.map.ser.std.StdContainerSerializers$IndexedListSerializer.serializeContents(StdContainerSerializers.java:71)
>   at 
> org.codehaus.jackson.map.ser.std.AsArraySerializerBase.serialize(AsArraySerializerBase.java:86)
>   at 
> org.codehaus.jackson.map.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:446)
>   at 
> org.codehaus.jackson.map.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:150)
>   at 
> org.codehaus.jackson.map.ser.BeanSerializer.serialize(BeanSerializer.java:112)
>   at 
> org.codehaus.jackson.map.ser.StdSerializerProvider._serializeValue(StdSerializerProvider.java:610)
>   at 
> org.codehaus.jackson.map.ser.StdSerializerProvider.serializeValue(StdSerializerProvider.java:256)
>   at 
> org.codehaus.jackson.map.ObjectMapper._configAndWriteValue(ObjectMapper.java:2575)
>   at 
> org.codehaus.jackson.map.ObjectMapper.writeValueAsString(ObjectMapper.java:2097)
>   at 
> org.apache.hadoop.yarn.service.utils.JsonSerDeser.toJson(JsonSerDeser.java:249)
>   at 
> org.apache.hadoop.yarn.service.ClientAMService.getStatus(ClientAMService.java:125)
>   at 
> org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPBServiceImpl.getStatus(ClientAMProtocolPBServiceImpl.java:59)
>   at 
> org.apache.hadoop.yarn.proto.ClientAMProtocol$ClientAMProtocolService$2.callBlockingMethod(ClientAMProtocol.java:6159)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 11, Size: 11
>   at java.util.ArrayList.rangeCheck(ArrayList.java:659)
>   at java.util.ArrayList.get(ArrayList.java:435)
>   at 
> org.codehaus.jackson.map.ser.std.StdContainerSerializers$IndexedListSerializer.serializeContents(StdContainerSerializers.java:106)
>   ... 27 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)