[jira] [Updated] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images

2020-08-10 Thread Leitao Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-3159:
-
Description: 
Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
only 1 "/" in the path.
{code:java}
public static final String DOCKER_IMAGE_PATTERN = 
"^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
{code}
In our cluster, the image name have multi layers, such as 
"docker-registry:8080/cloud/hadoop-docker:2.6.0", which is workable when using 
"docker pull IMAGE_NAME", but can not pass the check of image name in 
saneDockerImage().

  was:
Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
only 1 "/" in the path.

{code}
public static final String DOCKER_IMAGE_PATTERN = 
"^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
{code}

In our cluster, the image name have multi layers, such as 
"docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is 
workable when using "docker pull IMAGE_NAME", but can not pass the check of 
image name in saneDockerImage().


> DOCKER_IMAGE_PATTERN should support multilayered path of docker images
> --
>
> Key: YARN-3159
> URL: https://issues.apache.org/jira/browse/YARN-3159
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Leitao Guo
>Assignee: Leitao Guo
>Priority: Major
>  Labels: BB2015-05-TBR
> Attachments: YARN-3159.patch
>
>
> Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
> docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
> only 1 "/" in the path.
> {code:java}
> public static final String DOCKER_IMAGE_PATTERN = 
> "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
> {code}
> In our cluster, the image name have multi layers, such as 
> "docker-registry:8080/cloud/hadoop-docker:2.6.0", which is workable when 
> using "docker pull IMAGE_NAME", but can not pass the check of image name in 
> saneDockerImage().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3282) DockerContainerExecutor should support environment variables setting

2015-03-03 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14346315#comment-14346315
 ] 

Leitao Guo commented on YARN-3282:
--

Hi [~ashahab], thanks for your comments. 

I agree with you that a docker image should try to be self-sufficient, but in 
our scenarios, the data are stored in a remote shared  storage system, we mount 
these storage as a local dirs to nodemanager, and only a few applications 
running in docker containers need to access these dirs. So these dirs are not 
suitable as yarn.nodemanager.local-dirs, which are mounted to docker 
container as default. IMHO. I think an extra method to set environment 
variables for docker containers is necessary.

 DockerContainerExecutor should support environment variables setting
 

 Key: YARN-3282
 URL: https://issues.apache.org/jira/browse/YARN-3282
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications, nodemanager
Affects Versions: 2.6.0
Reporter: Leitao Guo
 Attachments: YARN-3282.01.patch


 Currently, DockerContainerExecutor will mount yarn.nodemanager.local-dirs 
 and yarn.nodemanager.log-dirs to containers automatically. However 
 applications maybe need set more environment variables before launching 
 containers. 
 In our applications, just as the following command, we need to attach several 
 directories and set some environment variables to docker containers. 
 {code}
 docker run -i -t -v /data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v 
 /mnt:/mnt -e VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e 
 VTC_RUNTIME=vtc sequenceiq/hadoop-docker:2.6.0 /bin/bash
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3282) DockerContainerExecutor should support environment variables setting

2015-02-28 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-3282:
-
Attachment: YARN-3282.01.patch

When using DockerContainerExecutor, mapreduce jobs can set docker environment 
variables via yarn.nodemanager.docker-container-executor.env after the patch.

e.g. 
{code}
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples.jar wordcount 
-Dyarn.app.mapreduce.am.env=yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0
 
-Dmapreduce.map.env=yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0
  
-Dmapreduce.reduce.env=yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0
   -Dyarn.nodemanager.docker-container-executor.env=-v 
/data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v /mnt:/mnt -e 
VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e VTC_RUNTIME=vtc 
/wordcount_input /wordcount_output
{code}

 DockerContainerExecutor should support environment variables setting
 

 Key: YARN-3282
 URL: https://issues.apache.org/jira/browse/YARN-3282
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications, nodemanager
Affects Versions: 2.6.0
Reporter: Leitao Guo
 Attachments: YARN-3282.01.patch


 Currently, DockerContainerExecutor will mount yarn.nodemanager.local-dirs 
 and yarn.nodemanager.log-dirs to containers automatically. However 
 applications maybe need set more environment variables before launching 
 containers. 
 In our applications, just as the following command, we need to attach several 
 directories and set some environment variables to docker containers. 
 {code}
 docker run -i -t -v /data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v 
 /mnt:/mnt -e VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e 
 VTC_RUNTIME=vtc sequenceiq/hadoop-docker:2.6.0 /bin/bash
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3282) DockerContainerExecutor should support environment variables setting

2015-02-28 Thread Leitao Guo (JIRA)
Leitao Guo created YARN-3282:


 Summary: DockerContainerExecutor should support environment 
variables setting
 Key: YARN-3282
 URL: https://issues.apache.org/jira/browse/YARN-3282
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications, nodemanager
Affects Versions: 2.6.0
Reporter: Leitao Guo


Currently, DockerContainerExecutor will mount yarn.nodemanager.local-dirs and 
yarn.nodemanager.log-dirs to containers automatically. However applications 
maybe need set more environment variables before launching containers. 

In our applications, just as the following command, we need to attach several 
directories and set some environment variables to docker containers. 

{code}
docker run -i -t -v /data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v /mnt:/mnt 
-e VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e VTC_RUNTIME=vtc 
sequenceiq/hadoop-docker:2.6.0 /bin/bash
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images

2015-02-09 Thread Leitao Guo (JIRA)
Leitao Guo created YARN-3159:


 Summary: DOCKER_IMAGE_PATTERN should support multilayered path of 
docker images
 Key: YARN-3159
 URL: https://issues.apache.org/jira/browse/YARN-3159
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Leitao Guo


Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
docker images with the path like sequenceiq/hadoop-docker:2.6.0, which has 
only 1 / in the path.

{code}
public static final String DOCKER_IMAGE_PATTERN = 
^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$;
{code}

In our cluster, the image name have multi layers, such as 
docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0, which is 
workable when using docker pull IMAGE_NAME, but can not pass the check of 
image name in saneDockerImage().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images

2015-02-09 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-3159:
-
Attachment: YARN-3159.patch

 DOCKER_IMAGE_PATTERN should support multilayered path of docker images
 --

 Key: YARN-3159
 URL: https://issues.apache.org/jira/browse/YARN-3159
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Leitao Guo
 Attachments: YARN-3159.patch


 Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
 docker images with the path like sequenceiq/hadoop-docker:2.6.0, which has 
 only 1 / in the path.
 {code}
 public static final String DOCKER_IMAGE_PATTERN = 
 ^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$;
 {code}
 In our cluster, the image name have multi layers, such as 
 docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0, which is 
 workable when using docker pull IMAGE_NAME, but can not pass the check of 
 image name in saneDockerImage().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images

2015-02-09 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14312151#comment-14312151
 ] 

Leitao Guo commented on YARN-3159:
--

Ok, I'll add the unit test.

 DOCKER_IMAGE_PATTERN should support multilayered path of docker images
 --

 Key: YARN-3159
 URL: https://issues.apache.org/jira/browse/YARN-3159
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Leitao Guo
Assignee: Leitao Guo
 Attachments: YARN-3159.patch


 Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
 docker images with the path like sequenceiq/hadoop-docker:2.6.0, which has 
 only 1 / in the path.
 {code}
 public static final String DOCKER_IMAGE_PATTERN = 
 ^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$;
 {code}
 In our cluster, the image name have multi layers, such as 
 docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0, which is 
 workable when using docker pull IMAGE_NAME, but can not pass the check of 
 image name in saneDockerImage().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor

2015-01-26 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292809#comment-14292809
 ] 

Leitao Guo commented on YARN-2718:
--

I think this is good to our hadoop cluster, since we have a few applications 
which have to running in docker containers, but most of the apps needs LCE. So, 
we need a compositeContainerExecutor and let apps configure which 
containerexecutor they need.

 Create a CompositeConatainerExecutor that combines DockerContainerExecutor 
 and DefaultContainerExecutor
 ---

 Key: YARN-2718
 URL: https://issues.apache.org/jira/browse/YARN-2718
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Abin Shahab
 Attachments: YARN-2718.patch


 There should be a composite container that allows users to run their jobs in 
 DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging 
 purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor

2015-01-26 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292910#comment-14292910
 ] 

Leitao Guo commented on YARN-2718:
--

[~chenchun], in the following codes, I think you should return directly in case 
'containerExecutor == null'
[code]
  @Override
  public void setContainerExecutor(String containerExecutor) {
maybeInitBuilder();
if (containerExecutor == null) {
  builder.clearContainerExecutor();
}
builder.setContainerExecutor(containerExecutor);
  }
[code]

 Create a CompositeConatainerExecutor that combines DockerContainerExecutor 
 and DefaultContainerExecutor
 ---

 Key: YARN-2718
 URL: https://issues.apache.org/jira/browse/YARN-2718
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Abin Shahab
 Attachments: YARN-2718.patch


 There should be a composite container that allows users to run their jobs in 
 DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging 
 purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2466) Umbrella issue for Yarn launched Docker Containers

2015-01-22 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288608#comment-14288608
 ] 

Leitao Guo commented on YARN-2466:
--

Currently, if I want to use DCE in my cluster, all the application should be 
running in DCE, that is not practical in our cluster.  Can 
yarn.nodemanager.container-executor.class support configurable per 
application? So that, we can use DCE in some applications, others can still use 
LCE.

 Umbrella issue for Yarn launched Docker Containers
 --

 Key: YARN-2466
 URL: https://issues.apache.org/jira/browse/YARN-2466
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Abin Shahab
Assignee: Abin Shahab

 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to package their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).
 In addition to software isolation mentioned above, Docker containers will 
 provide resource, network, and user-namespace isolation. 
 Docker provides resource isolation through cgroups, similar to 
 LinuxContainerExecutor. This prevents one job from taking other jobs 
 resource(memory and CPU) on the same hadoop cluster. 
 User-namespace isolation will ensure that the root on the container is mapped 
 an unprivileged user on the host. This is currently being added to Docker.
 Network isolation will ensure that one user’s network traffic is completely 
 isolated from another user’s network traffic. 
 Last but not the least, the interaction of Docker and Kerberos will have to 
 be worked out. These Docker containers must work in a secure hadoop 
 environment.
 Additional details are here: 
 https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue

2014-10-10 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166558#comment-14166558
 ] 

Leitao Guo commented on YARN-1582:
--

Any updates for this jira? Why not add yarn.scheduler.maximum-allocation-vcores 
to each queue?

 Capacity Scheduler: add a maximum-allocation-mb setting per queue 
 --

 Key: YARN-1582
 URL: https://issues.apache.org/jira/browse/YARN-1582
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 3.0.0, 0.23.10, 2.2.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1582-branch-0.23.patch


 We want to allow certain queues to use larger container sizes while limiting 
 other queues to smaller container sizes.  Setting it per queue will help 
 prevent abuse, help limit the impact of reservations, and allow changes in 
 the maximum container size to be rolled out more easily.
 One reason this is needed is more application types are becoming available on 
 yarn and certain applications require more memory to run efficiently. While 
 we want to allow for that we don't want other applications to abuse that and 
 start requesting bigger containers then what they really need.  
 Note that we could have this based on application type, but that might not be 
 totally accurate either since for example you might want to allow certain 
 users on MapReduce to use larger containers, while limiting other users of 
 MapReduce to smaller containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-09-16 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-
Attachment: (was: 3.before-patch.JPG)

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Assignee: Leitao Guo
 Attachments: YARN-2348.2.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-09-16 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-
Attachment: (was: 4.after-patch.JPG)

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Assignee: Leitao Guo
 Attachments: YARN-2348.2.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-09-16 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-
Attachment: YARN-2348.3.patch

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Assignee: Leitao Guo
 Attachments: YARN-2348.2.patch, YARN-2348.3.patch, afterpatch.jpg


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-09-16 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-
Attachment: afterpatch.jpg

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Assignee: Leitao Guo
 Attachments: YARN-2348.2.patch, YARN-2348.3.patch, afterpatch.jpg


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-09-14 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133488#comment-14133488
 ] 

Leitao Guo commented on YARN-1729:
--

[~zjshen] , sorry! It must be my mistake to assign this to me.

 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Fix For: 2.4.0

 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-08-06 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo reassigned YARN-1729:


Assignee: Leitao Guo  (was: Billie Rinaldi)

 TimelineWebServices always passes primary and secondary filters as strings
 --

 Key: YARN-1729
 URL: https://issues.apache.org/jira/browse/YARN-1729
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Leitao Guo
 Fix For: 2.4.0

 Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
 YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch


 Primary filters and secondary filter values can be arbitrary json-compatible 
 Object.  The web services should determine if the filters specified as query 
 parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-31 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: (was: YARN-2348.patch)

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-31 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: (was: YARN-2348.2.patch)

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-30 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078954#comment-14078954
 ] 

Leitao Guo commented on YARN-2368:
--

Thanks [~ozawa] for your comments. 

I deployed hadoop-2.3.0-cdh5.1.0 with 22-queue fairscheduler on my 20-node 
cluster. Two resourcemanagers are deployed exclusively on 10.153.80.8 and 
10.153.80.18. 

Jobs are submitted from gridmix:
{code}
sudo -u mapred hadoop jar /usr/lib/hadoop-mapreduce/hadoop-gridmix.jar 
-Dgridmix.min.file.size=10485760 
-Dgridmix.job-submission.use-queue-in-trace=true 
-Dgridmix.distributed-cache-emulation.enable=false  -generate 34816m 
hdfs:///user/mapred/foo/ hdfs:///tmp/job-trace.json
{code}
job-trace.json is generated by Rumen, with 6,000 jobs, average #maptasks per 
job is  320 and average #reducetasks is 25.

I found 3 times (gridmix tested more than 3 times) that resourcemanager failed 
when handle STATE_STORE_OP_FAILED event. At the same time, zookeeper throws out 
 'Len error IOException'
{code}
... ...
2014-07-24 21:00:51,170 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.153.80.8:47135
2014-07-24 21:00:51,171 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
attempting to renew session 0x247678daa88001a at /10.153.80.8:47135
2014-07-24 21:00:51,171 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 
0x247678daa88001a
2014-07-24 21:00:51,171 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 
0x247678daa88001a with negotiated timeout 1 for client /10.153.80.8:47135
2014-07-24 21:00:51,171 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
packet /10.153.80.8:47135
2014-07-24 21:00:51,172 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success 
/10.153.80.8:47135
2014-07-24 21:00:51,186 [myid:3] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247678daa88001a due to java.io.IOException: Len 
error 1813411
2014-07-24 21:00:51,186 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /10.153.80.8:47135 which had sessionid 0x247678daa88001a

... ...

2014-07-25 22:10:08,919 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.153.80.8:50480
2014-07-25 22:10:08,921 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
attempting to renew session 0x247684586e70006 at /10.153.80.8:50480
2014-07-25 22:10:08,922 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established 
session 0x247684586e70006 with negotiated timeout 1 for client 
/10.153.80.8:50480
2014-07-25 22:10:08,922 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
packet /10.153.80.8:50480
2014-07-25 22:10:08,923 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success 
/10.153.80.8:50480
2014-07-25 22:10:08,934 [myid:3] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
2014-07-25 22:10:08,934 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /10.153.80.8:50480 which had sessionid 0x247684586e70006

... ...

2014-07-26 02:22:59,627 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.153.80.18:60588
2014-07-26 02:22:59,629 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
attempting to renew session 0x2476de7c1af0002 at /10.153.80.18:60588
2014-07-26 02:22:59,629 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established 
session 0x2476de7c1af0002 with negotiated timeout 1 for client 
/10.153.80.18:60588
2014-07-26 02:22:59,630 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
packet /10.153.80.18:60588
2014-07-26 02:22:59,630 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success 
/10.153.80.18:60588
2014-07-26 02:22:59,648 [myid:3] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x2476de7c1af0002 due to java.io.IOException: Len 
error 1649043
2014-07-26 02:22:59,648 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /10.153.80.18:60588 which had sessionid 

[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: YARN-2348.2.patch

Please find the new patch to this issue in YARN-2348.2.patch. 

In this patch, resourcemanager server will format the date of Start/FinisheTime 
first, instead of rendering date in browser.

 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: YARN-2348.2.patch, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: (was: 2.after-change.jpg)

 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: YARN-2348.2.patch, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: (was: 1.before-change.jpg)

 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: YARN-2348.2.patch, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: 4.after-patch.JPG
3.before-patch.JPG

Here are the new snapshots of Web UI of my cluster before/after the patch. 

 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
 YARN-2348.2.patch, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Summary: ResourceManager web UI should display server-side time instead of 
UTC time  (was: ResourceManager web UI should display locale time instead of 
UTC time)

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
 YARN-2348.2.patch, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Description: ResourceManager web UI, including application list and 
scheduler, displays UTC time in default,  this will confuse users who do not 
use UTC time. This web UI should display server-side time in default.  (was: 
ResourceManager web UI, including application list and scheduler, displays UTC 
time in default,  this will confuse users who do not use UTC time. This web UI 
should display server-side time 
in default.)

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
 YARN-2348.2.patch, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Description: 
ResourceManager web UI, including application list and scheduler, displays UTC 
time in default,  this will confuse users who do not use UTC time. This web UI 
should display server-side time 
in default.

  was:ResourceManager web UI, including application list and scheduler, 
displays UTC time in default,  this will confuse users who do not use UTC time. 
This web UI should display local time of users.


 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
 YARN-2348.2.patch, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time 
 in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079299#comment-14079299
 ] 

Leitao Guo commented on YARN-2348:
--

Hi [~aw] [~tucu00] [~raviprak] , thanks for your comments. I agree with you 
that the Web UI should display the time just the same as the server side. 
Please have a check of the new patch, thanks!

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
 YARN-2348.2.patch, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079423#comment-14079423
 ] 

Leitao Guo commented on YARN-2348:
--

[~chengbing.liu] thanks, Bing!

 ResourceManager web UI should display server-side time instead of UTC time
 --

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
 YARN-2348.2.patch, YARN-2348.2.patch, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:



2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)


Meanwhile ZooKeeps logs as the following:

2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747


  was:
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 

[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Attachment: YARN-2368.patch

 ResourceManager failed when ZKRMStateStore tries to update znode data larger 
 than 1MB
 -

 Key: YARN-2368
 URL: https://issues.apache.org/jira/browse/YARN-2368
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Priority: Critical
 Attachments: YARN-2368.patch


 Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed 
 finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode 
 larger than 1MB, which is the default configuration of ZooKeeper server and 
 client in 'jute.maxbuffer'.
 ResourceManager log shows as the following:
 
 2014-07-25 22:33:11,078 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session connected
 2014-07-25 22:33:11,078 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
 ZKRMStateStore Session restored
 2014-07-25 22:33:11,214 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
 org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
 STATE_STORE_OP_FAILED. Cause:
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss for 
 /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:745)
 Meanwhile ZooKeeps logs as the following:
 
 2014-07-25 22:10:09,742 [myid:1] - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
 causing close of session 0x247684586e70006 due to java.io.IOException: Len 
 error 1530747
 ... ...
 2014-07-25 22:33:10,966 [myid:1] - WARN  
 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
 causing close of session 0x247684586e70006 due to java.io.IOException: Len 
 error 1530747



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)



Meanwhile ZooKeeps logs as the following:

2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747


  was:
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:



2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 

[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)



Meanwhile, ZooKeeps logs as the following:

2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747


  was:
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 

[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)



Meanwhile, ZooKeeps log shows as the following:

2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747


  was:
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 

[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager (ip addr: 10.153.80.8) log shows as the following:
{code}
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
{code}


Meanwhile, ZooKeeps log shows as the following:
{code}
2014-07-25 22:10:09,728 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.153.80.8:58890
2014-07-25 22:10:09,730 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
attempting to renew session 0x247684586e70006 at /10.153.80.8:58890
2014-07-25 22:10:09,730 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 
0x247684586e70006
2014-07-25 22:10:09,730 [myid:1] - INFO  
[QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 
0x247684586e70006 with negotiated timeout 1 for client /10.153.80.8:58890
2014-07-25 22:10:09,730 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
packet /10.153.80.8:58890
2014-07-25 22:10:09,730 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success 
/10.153.80.8:58890
2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530
747
2014-07-25 22:10:09,743 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /10.153.80.8:58890 which had sessionid 0x247684586e70006
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
{code}

  was:
Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 

[jira] [Created] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-23 Thread Leitao Guo (JIRA)
Leitao Guo created YARN-2348:


 Summary: ResourceManager web UI should display locale time instead 
of UTC time
 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 1.before-change.jpg, 2.after-change.jpg

ResourceManager web UI, including application list and scheduler, displays UTC 
time in default,  this will confuse users who do not use UTC time. This web UI 
should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-23 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: 2.after-change.jpg

 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 1.before-change.jpg, 2.after-change.jpg


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-23 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: 1.before-change.jpg

 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 1.before-change.jpg, 2.after-change.jpg


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-23 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: YARN-2348.patch

Please have a check of the patch.

 ResourceManager web UI should display locale time instead of UTC time
 -

 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch


 ResourceManager web UI, including application list and scheduler, displays 
 UTC time in default,  this will confuse users who do not use UTC time. This 
 web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2321) NodeManager web UI can incorrectly report Pmem enforcement

2014-07-21 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069661#comment-14069661
 ] 

Leitao Guo commented on YARN-2321:
--

Thanks Jason Lowe!

 NodeManager web UI can incorrectly report Pmem enforcement
 --

 Key: YARN-2321
 URL: https://issues.apache.org/jira/browse/YARN-2321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Assignee: Leitao Guo
 Fix For: 3.0.0, 2.6.0

 Attachments: YARN-2321.patch


 WebUI of NodeManager get the wrong configuration of Pmem enforcement enable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2321) NodeManager WebUI get wrong configuration of isPmemCheckEnabled()

2014-07-18 Thread Leitao Guo (JIRA)
Leitao Guo created YARN-2321:


 Summary: NodeManager WebUI get wrong configuration of 
isPmemCheckEnabled()
 Key: YARN-2321
 URL: https://issues.apache.org/jira/browse/YARN-2321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo


WebUI of NodeManager get the wrong configuration of Pmem enforcement enable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2321) NodeManager WebUI get wrong configuration of isPmemCheckEnabled()

2014-07-18 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2321:
-

Attachment: YARN-2321.patch

 NodeManager WebUI get wrong configuration of isPmemCheckEnabled()
 -

 Key: YARN-2321
 URL: https://issues.apache.org/jira/browse/YARN-2321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: YARN-2321.patch


 WebUI of NodeManager get the wrong configuration of Pmem enforcement enable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)