from:"Miklos Szegedi \(JIRA\)"

[jira] [Updated] (YARN-2185) Use pipes when localizing archives

2018-01-09 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-2185:
-
Attachment: YARN-2185.003.patch

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch, YARN-2185.001.patch, 
> YARN-2185.002.patch, YARN-2185.003.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7705) Create the container log directory with correct sticky bit in C code

2018-01-09 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319519#comment-16319519
 ] 

Miklos Szegedi commented on YARN-7705:
--

Adding dependency

> Create the container log directory with correct sticky bit in C code
> 
>
> Key: YARN-7705
> URL: https://issues.apache.org/jira/browse/YARN-7705
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7705.001.patch, YARN-7705.002.patch
>
>
> YARN-7363 created the container log directory in Java, which isn't able to 
> set the correct sticky bit because of Java language limitation. Wrong sticky 
> bit of log directory causes failure of reading log files inside the 
> directory. To solve that, we need to do it in C code. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2185) Use pipes when localizing archives

2018-01-09 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319446#comment-16319446
 ] 

Miklos Szegedi commented on YARN-2185:
--

Thank you for the review, [~jlowe] and [~grepas].

bq. This patch adds parallel copying of directories and ability to ignore 
timestamps which IMHO is outside the scope of this patch. I think those should 
be handled in separate JIRAs with appropriate unit tests.
I created YARN-7713 and YARN-7712 for these two ideas respectively.
bq. Do we really want a while loop to pull off extensions? That's not how it 
behaved before, and it will now do different, potentially unexpected, things 
for a ".jar.gz", ".tgz.tgz", ".gz.jar", etc. Granted these are unlikely 
filenames to encounter, but if somehow a user had those before and were 
working, this will change that behavior.
I agree, let's be more conservative and do not change the existing behavior.
bq. Does it make sense to use an external process to unpack .jar and .zip 
files? The original code did this via the JDK directly rather than requiring a 
subprocess, and given there's a ZipInputStream and JarInputStream, it seems we 
could do the same here with the input stream rather than requiring the overhead 
of a subprocess and copying of data between streams.
I was wondering, if it should be better for OS specific archives containing 
symlinks for example. Here is what I suggest. I do not like the idea to use 
different method for tar and zip. On the other hand we do not have enough 
information to know, how many users would actually work better with OS toolset. 
I am not in favor of too many config knobs either but in this case I created 
one. The administrator can decide whether to rely on OS tools altogether or 
java itself. Calling out jar indeed does not make much sense, since it does the 
same what we would do in the child process. I think this solution is much 
cleaner than arbitrarily use tar in Linux but rely on Java for jars.
bq. This now logs an INFO message for all stderr and stdout for the subprocess
I addressed this concern.
bq. Wouldn't "cd destpath && tar -xf" be better than "cd destpath; tar -xf"? 
Otherwise if somehow the change directory fails it will end up unpacking to an 
unexpected place.
I just followed the existing pattern. I agree, let’s be more resilient and I 
will change to && as you suggested.
bq. Is the "rm -rf" necessary?
It does not hurt. I replaced with a delete that just returns, if the directory 
does not exist
bq. Has this been tested on Windows? The zipOrJar path appears to be eligible 
for Windows but uses shell syntax that cmd is not going to understand, like 
semicolons between commands.
Good catch, the latest code uses the Java path now for Windows. However, I have 
limited ability to test there.
bq. I wonder if it would be better to add a method to Shell that takes an input 
stream so we could reuse all the subprocess handling code there.
My usage of ProcessBuilder seems very simple I am hesitant to overload the 
shell executor with this functionality. This helps with backports as well.

I think I addressed most other comments in the next patch that I will submit 
soon.

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch, YARN-2185.001.patch, 
> YARN-2185.002.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7712) Add ability to ignore timestamps in localized files

2018-01-09 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318891#comment-16318891
 ] 

Miklos Szegedi commented on YARN-7712:
--

[~chris.douglas], thank you for the suggestion. I would like to keep this jira 
simple, all it is about is to ignore the timestamp check on downloads, if the 
client specifies invalid, -1.
Talking about PathHandle, I agree it would be way more precise than the 
timestamp and filename to specify dependencies.

> Add ability to ignore timestamps in localized files
> ---
>
> Key: YARN-7712
> URL: https://issues.apache.org/jira/browse/YARN-7712
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>
> YARN currently requires and checks the timestamp of localized files and 
> fails, if the file on HDFS does not match to the one requested. This jira 
> adds the ability to ignore the timestamp based on the request of the client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7713) Add parallel copying of directories into FSDownload

2018-01-08 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7713:
-
Summary: Add parallel copying of directories into FSDownload  (was: Add 
parallel copying of directories into)

> Add parallel copying of directories into FSDownload
> ---
>
> Key: YARN-7713
> URL: https://issues.apache.org/jira/browse/YARN-7713
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>
> YARN currently copies directories sequentially when localizing. This could be 
> improved to do in parallel, since the source blocks are normally on different 
> nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7689) TestRMContainerAllocator fails after YARN-6124

2018-01-08 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317310#comment-16317310
 ] 

Miklos Szegedi commented on YARN-7689:
--

Thank you, [~wilfreds]. Could you address the outstanding checkstyle issue?

> TestRMContainerAllocator fails after YARN-6124
> --
>
> Key: YARN-7689
> URL: https://issues.apache.org/jira/browse/YARN-7689
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-7689.001.patch, YARN-7689.002.patch
>
>
> After the change that was made for YARN-6124 multiple tests in the 
> TestRMContainerAllocator from MapReduce fail with the following NPE:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.reinitialize(AbstractYarnScheduler.java:1437)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.reinitialize(FifoScheduler.java:320)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator$ExcessReduceContainerAllocateScheduler.(TestRMContainerAllocator.java:1808)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator$MyResourceManager2.createScheduler(TestRMContainerAllocator.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:659)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1133)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:316)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1334)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:141)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:137)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator$MyResourceManager.(TestRMContainerAllocator.java:928)
> {code}
> In the test we just call reinitiaize on a scheduler and never call init.
> The stop of the service is guarded and so should the start and the re-init.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7713) Add parallel copying of directories into

2018-01-08 Thread Miklos Szegedi (JIRA)

Miklos Szegedi created YARN-7713:


 Summary: Add parallel copying of directories into
 Key: YARN-7713
 URL: https://issues.apache.org/jira/browse/YARN-7713
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


YARN currently copies directories sequentially when localizing. This could be 
improved to do in parallel, since the source blocks are normally on different 
nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7712) Add ability to ignore timestamps in localized files

2018-01-08 Thread Miklos Szegedi (JIRA)

Miklos Szegedi created YARN-7712:


 Summary: Add ability to ignore timestamps in localized files
 Key: YARN-7712
 URL: https://issues.apache.org/jira/browse/YARN-7712
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


YARN currently requires and checks the timestamp of localized files and fails, 
if the file on HDFS does not match to the one requested. This jira adds the 
ability to ignore the timestamp based on the request of the client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7590) Improve container-executor validation check

2018-01-05 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313730#comment-16313730
 ] 

Miklos Szegedi commented on YARN-7590:
--

[~eyang], thank you for the updated patch.
{code}
Testing check_nm_local_dir()
Error checking file stats for target -1 Unknown error -1.
test_nm_local_dir expected 0 got 1
{code}
I ran the unit test with the latest change and I got the error above.
I also found that you probably do not want to return out of memory here but 
another error code:
{code}
int check = check_nm_local_dir(nm_uid, *local_dir_ptr);
if (check != 0) {
  container_dir = NULL;
}
if (container_dir == NULL) {
  return OUT_OF_MEMORY;
}
{code}


> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7590.001.patch, YARN-7590.002.patch, 
> YARN-7590.003.patch, YARN-7590.004.patch, YARN-7590.005.patch, 
> YARN-7590.006.patch, YARN-7590.007.patch
>
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is owned by the same user as the caller to 
> container-executor.
> # Make sure the log directory prefix is owned by the same user as the caller.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers

2018-01-05 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313529#comment-16313529
 ] 

Miklos Szegedi commented on YARN-6673:
--

[~yangjiandan], I am assuming the requestor specifies more vcores if it is 
capable running more threads. We have the following scenarios:
1. There are only guaranteed containers on the node: In this case it does not 
matter what the opportunistic weight is
2. There are guaranteed containers that use the whole node and some 
opportunistic ones allocated: The opportunistic weight should be minimal (2), 
so that opportunistic containers are fully throttled without any other action 
from the node manager.
3. There are guaranteed containers that leave a gap and some opportunistic ones 
allocated:
  a) Let's say there are 3 physical cores left and there are two opportunistic 
containers with 2 and 1 threads each allocating 2 and 1 vcores respectively. 
Each thread gets a physical core, so the remaining resource is shared like (2,1)
  b) There is just one core left and there are two opportunistic containers 
with 2 and 1 threads each allocating 2 and 1 vcores respectively. Each thread 
gets a weight of 2, so the remaining CPU should be shared like (4,2)=(2,1).
4. There are only opportunistic containers: They will be promoted to guaranteed 
and the standard logic applies.

> Add cpu cgroup configurations for opportunistic containers
> --
>
> Key: YARN-6673
> URL: https://issues.apache.org/jira/browse/YARN-6673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6673.000.patch
>
>
> In addition to setting cpu.cfs_period_us on a per-container basis, we could 
> also set cpu.shares to 2 for opportunistic containers so they are run on a 
> best-effort basis



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7693) ContainersMonitor support configurable

2018-01-05 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313488#comment-16313488
 ] 

Miklos Szegedi commented on YARN-7693:
--

[~yangjiandan], than you for the detailed reply.
I understand your concern now. memory.soft_limit_in_bytes is just a good to 
have, AFAIK most users disable swapping. About 6.: YARN-6677 won't hang the 
paused guaranteed containers but preempt opportunistic ones first unblocking 
the guaranteed ones right away. This solution was tested in a cluster by 
[~sandflee] in YARN-4599. The effect on guaranteed containers should be 
negligible, if the logic is fast. Also the concern of promoting containers is 
still there with two top level cgroups.
How do you monitor resource utilization of containers?
Do you disable OOM killer?
How would you promote?
What do you do, if the guaranteed containers grow over the opportunistic ones? 
They still hang or get preempted regardless of the number and depth of top 
level cgroups, do not they?
Why do you reserve a gap? Does not that decrease resource utilization, which 
oversubscription is supposed to fix?
bq. If the gap is less than a given value, then decrease the hard limit of 
Guaranteed Group.
Did you mean this? "If the gap is less than a given value, then decrease the 
hard limit of Opportunistic Group." Decreasing the hard limit of the guaranteed 
group would mean that opportunistic ones have an effect on guaranteed ones.


> ContainersMonitor support configurable
> --
>
> Key: YARN-7693
> URL: https://issues.apache.org/jira/browse/YARN-7693
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Minor
> Attachments: YARN-7693.001.patch, YARN-7693.002.patch
>
>
> Currently ContainersMonitor has only one default implementation 
> ContainersMonitorImpl,
> After introducing Opportunistic Container, ContainersMonitor needs to monitor 
> system metrics and even dynamically adjust Opportunistic and Guaranteed 
> resources in the cgroup, so another ContainersMonitor may need to be 
> implemented. 
> The current ContainerManagerImpl ContainersMonitorImpl direct new 
> ContainerManagerImpl, so ContainersMonitor need to be configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers

2018-01-04 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312476#comment-16312476
 ] 

Miklos Szegedi commented on YARN-6673:
--

[~yangjiandan], would not that allow opportunistic containers to steal CPU 
cycles from guaranteed ones by setting the vcores to arbitrary size like 512? 
[~haibochen], is there an upper limit to vcores in RM?

> Add cpu cgroup configurations for opportunistic containers
> --
>
> Key: YARN-6673
> URL: https://issues.apache.org/jira/browse/YARN-6673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6673.000.patch
>
>
> In addition to setting cpu.cfs_period_us on a per-container basis, we could 
> also set cpu.shares to 2 for opportunistic containers so they are run on a 
> best-effort basis



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6416) SIGNAL_CMD argument number is wrong

2018-01-04 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-6416:
-
Labels: newbie  (was: )

> SIGNAL_CMD argument number is wrong
> ---
>
> Key: YARN-6416
> URL: https://issues.apache.org/jira/browse/YARN-6416
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Priority: Minor
>  Labels: newbie
>
> Yarn application signal command has two arguments, so the number below should 
> be 2 I think.
> {code}
> opts.getOption(SIGNAL_CMD).setArgs(3);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7516) Security check for untrusted docker image

2018-01-04 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311786#comment-16311786
 ] 

Miklos Szegedi edited comment on YARN-7516 at 1/5/18 1:54 AM:
--

[~eyang] thanks for the reply,
bq. I am inclined to move trusting individual image logic to a new JIRA
I agree, I did not mean that suggestion for this jira.



was (Author: miklos.szeg...@cloudera.com):
[~eyang] thanks for the reply,
bq. I am inclined to move trusting individual image logic to a new JIRA
I agree, I did not mean that the suggestion for this jira.


> Security check for untrusted docker image
> -
>
> Key: YARN-7516
> URL: https://issues.apache.org/jira/browse/YARN-7516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7516.001.patch, YARN-7516.002.patch, 
> YARN-7516.003.patch, YARN-7516.004.patch
>
>
> Hadoop YARN Services can support using private docker registry image or 
> docker image from docker hub.  In current implementation, Hadoop security is 
> enforced through username and group membership, and enforce uid:gid 
> consistency in docker container and distributed file system.  There is cloud 
> use case for having ability to run untrusted docker image on the same cluster 
> for testing.  
> The basic requirement for untrusted container is to ensure all kernel and 
> root privileges are dropped, and there is no interaction with distributed 
> file system to avoid contamination.  We can probably enforce detection of 
> untrusted docker image by checking the following:
> # If docker image is from public docker hub repository, the container is 
> automatically flagged as insecure, and disk volume mount are disabled 
> automatically, and drop all kernel capabilities.
> # If docker image is from private repository in docker hub, and there is a 
> white list to allow the private repository, disk volume mount is allowed, 
> kernel capabilities follows the allowed list.
> # If docker image is from private trusted registry with image name like 
> "private.registry.local:5000/centos", and white list allows this private 
> trusted repository.  Disk volume mount is allowed, kernel capabilities 
> follows the allowed list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2185) Use pipes when localizing archives

2018-01-04 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312332#comment-16312332
 ] 

Miklos Szegedi commented on YARN-2185:
--

The findbugs issue is in trunk.

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch, YARN-2185.001.patch, 
> YARN-2185.002.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-2185) Use pipes when localizing archives

2018-01-04 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-2185:
-
Attachment: YARN-2185.002.patch

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch, YARN-2185.001.patch, 
> YARN-2185.002.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7516) Security check for untrusted docker image

2018-01-04 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311786#comment-16311786
 ] 

Miklos Szegedi commented on YARN-7516:
--

[~eyang] thanks for the reply,
bq. I am inclined to move trusting individual image logic to a new JIRA
I agree, I did not mean that the suggestion for this jira.


> Security check for untrusted docker image
> -
>
> Key: YARN-7516
> URL: https://issues.apache.org/jira/browse/YARN-7516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7516.001.patch, YARN-7516.002.patch, 
> YARN-7516.003.patch, YARN-7516.004.patch
>
>
> Hadoop YARN Services can support using private docker registry image or 
> docker image from docker hub.  In current implementation, Hadoop security is 
> enforced through username and group membership, and enforce uid:gid 
> consistency in docker container and distributed file system.  There is cloud 
> use case for having ability to run untrusted docker image on the same cluster 
> for testing.  
> The basic requirement for untrusted container is to ensure all kernel and 
> root privileges are dropped, and there is no interaction with distributed 
> file system to avoid contamination.  We can probably enforce detection of 
> untrusted docker image by checking the following:
> # If docker image is from public docker hub repository, the container is 
> automatically flagged as insecure, and disk volume mount are disabled 
> automatically, and drop all kernel capabilities.
> # If docker image is from private repository in docker hub, and there is a 
> white list to allow the private repository, disk volume mount is allowed, 
> kernel capabilities follows the allowed list.
> # If docker image is from private trusted registry with image name like 
> "private.registry.local:5000/centos", and white list allows this private 
> trusted repository.  Disk volume mount is allowed, kernel capabilities 
> follows the allowed list.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6894) RM Apps API returns only active apps when query parameter queue used

2018-01-03 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310763#comment-16310763
 ] 

Miklos Szegedi commented on YARN-6894:
--

Thank you, [~gsohn]!

> RM Apps API returns only active apps when query parameter queue used
> 
>
> Key: YARN-6894
> URL: https://issues.apache.org/jira/browse/YARN-6894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, restapi
>Reporter: Grant Sohn
>Assignee: Gergely Novák
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: YARN-6894.001.patch, YARN-6894.002.patch, 
> YARN-6894.003.patch
>
>
> If you run RM's Cluster Applications API with no query parameters, you get a 
> list of apps.
> If you run RM's Cluster Applications API with any query parameters other than 
> "queue" you get the list of apps with the parameter filters being applied.
> However, when you use the "queue" query parameter, you only see the 
> applications that are active in the cluster (NEW, NEW_SAVING, SUBMITTED, 
> ACCEPTED, RUNNING).  This behavior is inconsistent with the API.  If there is 
> a sound reason behind this, it should be documented and it seems like there 
> might be as the mapred queue CLI behaves similarly.
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7516) Security check for untrusted docker image

2018-01-03 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310726#comment-16310726
 ] 

Miklos Szegedi commented on YARN-7516:
--

[~eyang], [~ebadger], [~vinodkv], thank you for the patch and the reviews.
[~ebadger], to answer your question, I have the following opinion about this 
jira:
1. This jira is a simple measure to block untrusted registries, as [~vinodkv] 
suggested above.
2. If I wanted to secure my cluster and not just the registry but the 
individual images in Docker on Hadoop, I would probably rely on the content 
signing feature of Docker. Ref: 
https://docs.docker.com/engine/security/trust/content_trust/#content-trust-operations-and-keys
 It signs the actual content. Even if someone gets access to the registry or 
the channel but not the signing key, they cannot compromise the system. Its 
advantage is that it does not require any change in Hadoop, AFAIK. Hadoop 
remains simple. Setting that up is non-trivial though.
3. I would never allow privileged containers in my own cluster. Native 
(non-Docker) Hadoop did not allow that, it was not part of the design. If 
something really needs root privileges, I would probably create a SUID script 
or executable as simple as possible, just like container executor. I would make 
sure it does only what it needs to do and run it from a YARN container without 
a Docker container. This reduces the attack surface.
4. If the question is how to allow images that are able to run as privileged 
and provide a simple and secure interface, I would probably list the allowed 
Docker images with digest SHA256 hash values (image@6bff...) in 
container-executor.cfg. Hadoop remains simple. It protects the node local HDFS 
data by disallowing root escalation, even if the yarn user is compromised.
5. If we were at the initial design step and we needed a little bit more 
complex but secure and scalable solution, I would just stick a single public 
key into container-executor.cfg maybe together with a flag whether privileged 
containers are allowed at all (ref. 3. above). The client would send the whole 
Docker JSON including the image, volumes, privileged flag etc. in the launch 
context. The RM would verify, if the user has permission to do the Docker 
command, so not all users would be allowed to run privileged for example. It 
would then sign the JSON with the corresponding private key into a delegation 
token or JWT and pass the signature with the JSON to the node in the launch 
context. The node manager would then forward the JSON and the signature to the 
container executor or another executable as suggested in YARN-7506. The 
container executor would verify the JWT or delegation token signature with the 
public key in container-executor.cfg and forward the JSON to the Docker 
command, if the verification is successful. The signature means that the RM 
allowed the request. Hadoop remains simple, security is centralized in the RM. 
RM can even have a REST API to dynamically adjust privileges. The privilege 
check might even be programmed as patterns in the JSON, minimizing the changes 
to Hadoop. I admit, this is probably a suggestion too late.
6. Just a side note how much we need to worry about buffer overflows, I am more 
concerned about actual security design problems affecting both C and Java. Most 
of the 2.5 million lines of Hadoop is Java. Ref: 
https://www.openhub.net/p/Hadoop.




> Security check for untrusted docker image
> -
>
> Key: YARN-7516
> URL: https://issues.apache.org/jira/browse/YARN-7516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7516.001.patch, YARN-7516.002.patch, 
> YARN-7516.003.patch, YARN-7516.004.patch
>
>
> Hadoop YARN Services can support using private docker registry image or 
> docker image from docker hub.  In current implementation, Hadoop security is 
> enforced through username and group membership, and enforce uid:gid 
> consistency in docker container and distributed file system.  There is cloud 
> use case for having ability to run untrusted docker image on the same cluster 
> for testing.  
> The basic requirement for untrusted container is to ensure all kernel and 
> root privileges are dropped, and there is no interaction with distributed 
> file system to avoid contamination.  We can probably enforce detection of 
> untrusted docker image by checking the following:
> # If docker image is from public docker hub repository, the container is 
> automatically flagged as insecure, and disk volume mount are disabled 
> automatically, and drop all kernel capabilities.
> # If docker image is from private repository in docker hub, and there is a 
> white list to allow the private repository, disk volume mount is allowed, 
> ke

[jira] [Commented] (YARN-7590) Improve container-executor validation check

2018-01-03 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310534#comment-16310534
 ] 

Miklos Szegedi commented on YARN-7590:
--

Thank you for the patch [~eyang]. I have two more style issues. I also verified 
the patch and it runs a basic mapreduce job and does not allow the scenario in 
the description as expected.
{code}
fprintf(LOGFILE, "Error checking file stats for %s.\n", nm_root);
{code}
It would be very useful to have a meaningful error message like 
{{fprintf(LOGFILE, "Error checking file stats for %s %d %s.\n", nm_root, err, 
strerror(err));}}. It helps a lot to support the feature.
{code}
  if (check != 0 || strstr(container_log_dir, "..") != 0) {
{code}
Like I mentioned before, I would separate the two checks with a meaningful 
error message in the second case. The first one already prints inside the call. 
This one also helps to support the feature.


> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7590.001.patch, YARN-7590.002.patch, 
> YARN-7590.003.patch, YARN-7590.004.patch, YARN-7590.005.patch, 
> YARN-7590.006.patch
>
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is owned by the same user as the caller to 
> container-executor.
> # Make sure the log directory prefix is owned by the same user as the caller.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7693) ContainersMonitor support configurable

2018-01-03 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310276#comment-16310276
 ] 

Miklos Szegedi commented on YARN-7693:
--

Thank you for the reply [~yangjiandan].
+0 on the approach adding a separarate monitor class for this. I think it is 
useful to be able to change the monitor.
In terms of the feature you described I have some suggestions, you may want to 
consider.
First of all please consider using a JIRA feature for your project making this 
as a sub-task. How about doing this as part of YARN-1747 or even better 
YARN-1011?
You may want to leverage the option to simply turn off the current cgroups 
memory enforcement using the configuration added in YARN-7064. It also handles 
monitoring resource utilization using cgroups.
bq. 1) Separate containers into two different group Opportunistic_Group and 
Guaranteed_Group under hadoop-yarn
The reason why it is useful to have a single cgroup hadoop-yarn for all 
containers that you can set a single logic and control the OOM killer for all. 
I would be happy to look at the actual code, but adjusting two different 
cgroups may add too much complexity. It is especially problematic in case of 
promotion. When an opportunistic container is promoted to guaranted, you need 
to move to the other cgroup but this requires heavy lifting from the kernel 
that takes significant time. See 
https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt for details.
bq. 2) Monitor system resource utilization and dynamically adjust resource of 
Opportunistic_Group
The concern here is that dynamically adjusting does not work in the current 
implementation either. This is because it is too slow to respond in extreme 
cases. Please check out YARN-6677, YARN-4599 and YARN-1014. The idea there is 
to disable the OOM killer on hadoop-yarn as you also suggested, so that we get 
notified by the kernel when the system resource utilization is low. YARN can 
then decide which container to preempt or adjust the soft limit, while the 
containers are paused. The preemption unblocks the containers. Please let us 
know, if you have time and you would like to contribute.
bq. 3) Kill container only when adjust resource fail for given times
I absolutely agree with this. A sudden spike in cpu usage should not trigger 
immediate preemption. In case of memory I am not sure how much you can adjust 
though. My understanding is that the basic design of opportunistic containers 
is that they never affect the performance of guaranteed ones but using IO for 
swapping would exactly do that. How would you reduce memory usage if not 
preempting?

> ContainersMonitor support configurable
> --
>
> Key: YARN-7693
> URL: https://issues.apache.org/jira/browse/YARN-7693
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Minor
> Attachments: YARN-7693.001.patch, YARN-7693.002.patch
>
>
> Currently ContainersMonitor has only one default implementation 
> ContainersMonitorImpl,
> After introducing Opportunistic Container, ContainersMonitor needs to monitor 
> system metrics and even dynamically adjust Opportunistic and Guaranteed 
> resources in the cgroup, so another ContainersMonitor may need to be 
> implemented. 
> The current ContainerManagerImpl ContainersMonitorImpl direct new 
> ContainerManagerImpl, so ContainersMonitor need to be configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7585) NodeManager should go unhealthy when state store throws DBException

2018-01-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309046#comment-16309046
 ] 

Miklos Szegedi commented on YARN-7585:
--

+1. Thank you for the contribution [~wilfreds] and for the review [~grepas]. I 
will commit this shortly.

> NodeManager should go unhealthy when state store throws DBException 
> 
>
> Key: YARN-7585
> URL: https://issues.apache.org/jira/browse/YARN-7585
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-7585.001.patch, YARN-7585.002.patch, 
> YARN-7585.003.patch
>
>
> If work preserving recover is enabled the NM will not start up if the state 
> store does not initialise. However if the state store becomes unavailable 
> after that for any reason the NM will not go unhealthy. 
> Since the state store is not available new containers can not be started any 
> more and the NM should become unhealthy:
> {code}
> AMLauncher: Error launching appattempt_1508806289867_268617_01. Got 
> exception: org.apache.hadoop.yarn.exceptions.YarnException: 
> java.io.IOException: org.iq80.leveldb.DBException: IO error: 
> /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: 
> Read-only file system
> at o.a.h.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:721)
> ...
> Caused by: java.io.IOException: org.iq80.leveldb.DBException: IO error: 
> /dsk/app/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/028269.log: 
> Read-only file system
> at 
> o.a.h.y.s.n.r.NMLeveldbStateStoreService.storeApplication(NMLeveldbStateStoreService.java:374)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:848)
> at 
> o.a.h.y.s.n.cm.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:712)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6894) RM Apps API returns only active apps when query parameter queue used

2018-01-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309042#comment-16309042
 ] 

Miklos Szegedi commented on YARN-6894:
--

+1 Thank you for the contribution [~GergelyNovak] and for the reviews [~gsohn] 
and [~sunilg]. I will commit this shortly.

> RM Apps API returns only active apps when query parameter queue used
> 
>
> Key: YARN-6894
> URL: https://issues.apache.org/jira/browse/YARN-6894
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, restapi
>Reporter: Grant Sohn
>Assignee: Gergely Novák
>Priority: Minor
> Attachments: YARN-6894.001.patch, YARN-6894.002.patch, 
> YARN-6894.003.patch
>
>
> If you run RM's Cluster Applications API with no query parameters, you get a 
> list of apps.
> If you run RM's Cluster Applications API with any query parameters other than 
> "queue" you get the list of apps with the parameter filters being applied.
> However, when you use the "queue" query parameter, you only see the 
> applications that are active in the cluster (NEW, NEW_SAVING, SUBMITTED, 
> ACCEPTED, RUNNING).  This behavior is inconsistent with the API.  If there is 
> a sound reason behind this, it should be documented and it seems like there 
> might be as the mapred queue CLI behaves similarly.
> http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_API



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7688) Miscellaneous Improvements To ProcfsBasedProcessTree

2018-01-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309011#comment-16309011
 ] 

Miklos Szegedi commented on YARN-7688:
--

Committed to trunk. Thank you for the contribution [~belugabehr].

> Miscellaneous Improvements To ProcfsBasedProcessTree
> 
>
> Key: YARN-7688
> URL: https://issues.apache.org/jira/browse/YARN-7688
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-7688.1.patch, YARN-7688.2.patch, YARN-7688.3.patch, 
> YARN-7688.4.patch
>
>
> * Use ArrayDeque for performance instead of LinkedList
> * Use more Apache Commons routines to replace existing implementations
> * Remove superfluous code guards around DEBUG statements
> * Remove superfluous annotations in the tests
> * Other small improvements



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7688) Miscellaneous Improvements To ProcfsBasedProcessTree

2018-01-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308982#comment-16308982
 ] 

Miklos Szegedi commented on YARN-7688:
--

+1 LGTM. I will commit this shortly.

> Miscellaneous Improvements To ProcfsBasedProcessTree
> 
>
> Key: YARN-7688
> URL: https://issues.apache.org/jira/browse/YARN-7688
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-7688.1.patch, YARN-7688.2.patch, YARN-7688.3.patch, 
> YARN-7688.4.patch
>
>
> * Use ArrayDeque for performance instead of LinkedList
> * Use more Apache Commons routines to replace existing implementations
> * Remove superfluous code guards around DEBUG statements
> * Remove superfluous annotations in the tests
> * Other small improvements



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7687) ContainerLogAppender Improvements

2018-01-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308942#comment-16308942
 ] 

Miklos Szegedi edited comment on YARN-7687 at 1/3/18 12:58 AM:
---

Committed to trunk. Thank you for the contribution [~belugabehr].


was (Author: miklos.szeg...@cloudera.com):
Committed to trunk.

> ContainerLogAppender Improvements
> -
>
> Key: YARN-7687
> URL: https://issues.apache.org/jira/browse/YARN-7687
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.1.0
>
> Attachments: YARN-7687.1.patch, YARN-7687.2.patch, YARN-7687.3.patch
>
>
> * Use Array-backed collection instead of LinkedList
> * Ignore calls to {{close()}} after the initial call
> * Clear the queue after {{close}} is called to let garbage collection do its 
> magic on the items inside of it
> * Fix int-to-long conversion issue (overflow)
> * Remove superfluous white space



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7687) ContainerLogAppender Improvements

2018-01-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308936#comment-16308936
 ] 

Miklos Szegedi commented on YARN-7687:
--

+1 LGTM. Thank you for the contribution [~belugabehr]. I will commit this 
shortly.

> ContainerLogAppender Improvements
> -
>
> Key: YARN-7687
> URL: https://issues.apache.org/jira/browse/YARN-7687
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Trivial
> Attachments: YARN-7687.1.patch, YARN-7687.2.patch, YARN-7687.3.patch
>
>
> * Use Array-backed collection instead of LinkedList
> * Ignore calls to {{close()}} after the initial call
> * Clear the queue after {{close}} is called to let garbage collection do its 
> magic on the items inside of it
> * Fix int-to-long conversion issue (overflow)
> * Remove superfluous white space



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7694) Optionally run shared cache manager as part of the resource manager

2018-01-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308833#comment-16308833
 ] 

Miklos Szegedi commented on YARN-7694:
--

[~ctrezzo], thank you for raising this. The shared cache manager could extend 
its features, when running as part of resource manager. It might be useful to 
register any HDFS file for deletion, if not used for a while, even if they are 
not part of the shared cache with the same codebase. What do you think?

> Optionally run shared cache manager as part of the resource manager
> ---
>
> Key: YARN-7694
> URL: https://issues.apache.org/jira/browse/YARN-7694
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>
> Currently the shared cache manager is its own stand-alone daemon. It is a 
> YARN composite service. Ideally, the shared cache manager could optionally be 
> run as part of the resource manager. This way administrators would have to 
> manage one less daemon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7693) ContainersMonitor support configurable

2018-01-02 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308824#comment-16308824
 ] 

Miklos Szegedi commented on YARN-7693:
--

[~yangjiandan], thank you for raising this. How is jira this related to 
YARN-7064? I believe a new resource calculator is enough in order to use 
cgroups for oversubscription resource measurement.

> ContainersMonitor support configurable
> --
>
> Key: YARN-7693
> URL: https://issues.apache.org/jira/browse/YARN-7693
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Jiandan Yang 
>Assignee: Jiandan Yang 
>Priority: Minor
> Attachments: YARN-7693.001.patch, YARN-7693.002.patch
>
>
> Currently ContainersMonitor has only one default implementation 
> ContainersMonitorImpl,
> After introducing Opportunistic Container, ContainersMonitor needs to monitor 
> system metrics and even dynamically adjust Opportunistic and Guaranteed 
> resources in the cgroup, so another ContainersMonitor may need to be 
> implemented. 
> The current ContainerManagerImpl ContainersMonitorImpl direct new 
> ContainerManagerImpl, so ContainersMonitor need to be configurable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7687) ContainerLogAppender Improvements

2017-12-29 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306571#comment-16306571
 ] 

Miklos Szegedi commented on YARN-7687:
--

Thank you for the patch [~belugabehr].
Please fix the outstanding checkstyle issue.
{code}
49  isBuffered = (maxEvents > 0);
{code}
I am not sure why this is needed, it adds to the memory footprint. You could 
just check for {{maxEvents}} in {{append}} mentioning in a comment that a 
positive value means that the data is buffered.
{code}
50if (maxEvents > 0) {  
51  tail = new LinkedList();  
52}
{code}
Can you tell me the reason why this check was removed? The lazy creation did 
actually save some memory if no buffering is used.

> ContainerLogAppender Improvements
> -
>
> Key: YARN-7687
> URL: https://issues.apache.org/jira/browse/YARN-7687
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Trivial
> Attachments: YARN-7687.1.patch
>
>
> * Use Array-backed collection instead of LinkedList
> * Ignore calls to {{close()}} after the initial call
> * Clear the queue after {{close}} is called to let garbage collection do its 
> magic on the items inside of it
> * Fix int-to-long conversion issue (overflow)
> * Remove superfluous white space



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7688) Miscellaneous Improvements To ProcfsBasedProcessTree

2017-12-29 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306550#comment-16306550
 ] 

Miklos Szegedi commented on YARN-7688:
--

Thank you for the patch [~belugabehr].
Please fix the outstanding checkstyle issue.
{code}
227 if (!"1".equals(pID)) {
232   if ("1".equals(ppid)) {
{code}
I actually liked the original order better.
{code}
165 LOG.info("ProcessTree: " + p);
{code}
How about {{LOG.info("ProcessTree: ", p);}}?


> Miscellaneous Improvements To ProcfsBasedProcessTree
> 
>
> Key: YARN-7688
> URL: https://issues.apache.org/jira/browse/YARN-7688
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-7688.1.patch, YARN-7688.2.patch, YARN-7688.3.patch
>
>
> * Use ArrayDeque for performance instead of LinkedList
> * Use more Apache Commons routines to replace existing implementations
> * Remove superfluous code guards around DEBUG statements
> * Remove superfluous annotations in the tests
> * Other small improvements



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7580) ContainersMonitorImpl logged message lacks detail when exceeding memory limits

2017-12-29 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306540#comment-16306540
 ] 

Miklos Szegedi commented on YARN-7580:
--

Thank you for the contribution [~wilfreds]!

> ContainersMonitorImpl logged message lacks detail when exceeding memory limits
> --
>
> Key: YARN-7580
> URL: https://issues.apache.org/jira/browse/YARN-7580
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Fix For: 3.1.0
>
> Attachments: YARN-7580.001.patch, YARN-7580.002.patch
>
>
> Currently in the RM logs container memory usage for a container that exceeds 
> the memory limit is reported like this:
> {code}
> 2016-06-14 09:15:36,694 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1464251583966_0932_r_000876_0: Container 
> [pid=134938,containerID=container_1464251583966_0932_01_002237] is running 
> beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory 
> used; 1.9 GB of 2.1 GB virtual memory used. Killing container.
> {code}
> Two enhancements as part of this jira:
> - make it clearer which limit we exceed
> - show exactly how much we exceeded the limit by



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7689) TestRMContainerAllocator fails after YARN-6124

2017-12-29 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306521#comment-16306521
 ] 

Miklos Szegedi commented on YARN-7689:
--

Thank you for the patch [~wilfreds].
{code}
// If re-init is called before init the manager is null and the RM
// will crash, this happens in a number of tests.
{code}
I would say "Protect against uninitialized scheduling monitor manager"
In general I would try to update the test to call the init. The reason is that 
this check may hide important race conditions in the future and will lead the 
code crash later or miss monitoring due to an uninitialized monitor manager.

> TestRMContainerAllocator fails after YARN-6124
> --
>
> Key: YARN-7689
> URL: https://issues.apache.org/jira/browse/YARN-7689
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-7689.001.patch
>
>
> After the change that was made for YARN-6124 multiple tests in the 
> TestRMContainerAllocator from MapReduce fail with the following NPE:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.reinitialize(AbstractYarnScheduler.java:1437)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.reinitialize(FifoScheduler.java:320)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator$ExcessReduceContainerAllocateScheduler.(TestRMContainerAllocator.java:1808)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator$MyResourceManager2.createScheduler(TestRMContainerAllocator.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:659)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1133)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:316)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.serviceInit(MockRM.java:1334)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:162)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:141)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.(MockRM.java:137)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator$MyResourceManager.(TestRMContainerAllocator.java:928)
> {code}
> In the test we just call reinitiaize on a scheduler and never call init.
> The stop of the service is guarded and so should the start and the re-init.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7580) ContainersMonitorImpl logged message lacks detail when exceeding memory limits

2017-12-29 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306509#comment-16306509
 ] 

Miklos Szegedi commented on YARN-7580:
--

+1. I will commit this shortly.

> ContainersMonitorImpl logged message lacks detail when exceeding memory limits
> --
>
> Key: YARN-7580
> URL: https://issues.apache.org/jira/browse/YARN-7580
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-7580.001.patch, YARN-7580.002.patch
>
>
> Currently in the RM logs container memory usage for a container that exceeds 
> the memory limit is reported like this:
> {code}
> 2016-06-14 09:15:36,694 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1464251583966_0932_r_000876_0: Container 
> [pid=134938,containerID=container_1464251583966_0932_01_002237] is running 
> beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory 
> used; 1.9 GB of 2.1 GB virtual memory used. Killing container.
> {code}
> Two enhancements as part of this jira:
> - make it clearer which limit we exceed
> - show exactly how much we exceeded the limit by



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-2185) Use pipes when localizing archives

2017-12-28 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-2185:
-
Attachment: YARN-2185.001.patch

Findbugs does not like the coding style, so I am fixing it with this patch.

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch, YARN-2185.001.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7688) Miscellaneous Improvements To ProcfsBasedProcessTree

2017-12-28 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305839#comment-16305839
 ] 

Miklos Szegedi commented on YARN-7688:
--

Thank you for the patch [~belugabehr]. I have a few comments.
{code}
245   Queue pInfoQueue = new ArrayDeque();
246   pInfoQueue.addAll(me.getChildren());
{code}
What is the point of creating an ArrayDeque without an initial capacity, when 
the future content is known already?
{code}
433 isAvailable = true;
434 incJiffies += p.getDtime();
{code}
What the point of this reordering?
{code}
727   if (StringUtils.EMPTY.equals(ret)) {
{code}
Could you use just isEmpty()?




> Miscellaneous Improvements To ProcfsBasedProcessTree
> 
>
> Key: YARN-7688
> URL: https://issues.apache.org/jira/browse/YARN-7688
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Priority: Minor
> Attachments: YARN-7688.1.patch
>
>
> * Use ArrayDeque for performance instead of LinkedList
> * Use more Apache Commons routines to replace existing implementations
> * Remove superfluous code guards around DEBUG statements
> * Remove superfluous annotations in the tests
> * Other small improvements



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7590) Improve container-executor validation check

2017-12-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302137#comment-16302137
 ] 

Miklos Szegedi commented on YARN-7590:
--

Thank you for the patch [~eyang]. I have a few style issues:
configuration.c has a new line with the patch that is not needed.
{code}
fprintf(LOGFILE, "Error checking file stats for %s.\n", nm_root);
{code}
It will be helpful to print out the actual error code for debugging.
{code}
fprintf(LOGFILE, "Permission mismatch for %s for uid: %d.\n", nm_root, 
caller_uid);
{code}
How about printing {{info.st_uid}} as well?
{code}
 if (check != 0 || strstr(container_log_dir, "/../") != 0) {
{code}
It is safer to check for ".." and also this check should be in a separate if 
with a proper log message to help debugging.



> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7590.001.patch, YARN-7590.002.patch, 
> YARN-7590.003.patch, YARN-7590.004.patch, YARN-7590.005.patch
>
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is owned by the same user as the caller to 
> container-executor.
> # Make sure the log directory prefix is owned by the same user as the caller.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-2185) Use pipes when localizing archives

2017-12-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302123#comment-16302123
 ] 

Miklos Szegedi edited comment on YARN-2185 at 12/23/17 12:10 AM:
-

Attaching my suggestion how to solve this. The code streams HDFS as standard 
input to the tar and gzip commands. It handles Windows as well. As an addition 
I create the temporary directory with permissions 700 instead of 755. I do not 
create any additional temporary directories for extraction, one is enough. A 
difference is that I use jar command for zips as well, so that it handles 
Windows properly. Also I added an additional switch to be able to disable the 
modification time check specifying -1 as the timestamp. I also do parallel copy 
for directory localization to leverage the distributed storage in HDFS.


was (Author: miklos.szeg...@cloudera.com):
Attaching my suggestion how to solve this. The code streams HDFS as standard 
input to the tar and gzip commands. It handles Windows as well. As an addition 
I create temporary files with permissions 700 instead of 755. I do not create 
any additional temporary directories for extraction, one is enough. A 
difference is that I use jar command for zips as well, so that it handles 
Windows properly. Also I added an additional switch to be able to disable the 
modification time check specifying -1 as the timestamp. I also do parallel copy 
for directory localization to leverage the distributed storage in HDFS.

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-2185) Use pipes when localizing archives

2017-12-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302123#comment-16302123
 ] 

Miklos Szegedi edited comment on YARN-2185 at 12/23/17 12:08 AM:
-

Attaching my suggestion how to solve this. The code streams HDFS as standard 
input to the tar and gzip commands. It handles Windows as well. As an addition 
I create temporary files with permissions 700 instead of 755. I do not create 
any additional temporary directories for extraction, one is enough. A 
difference is that I use jar command for zips as well, so that it handles 
Windows properly. Also I added an additional switch to be able to disable the 
modification time check specifying -1 as the timestamp. I also do parallel copy 
for directory localization to leverage the distributed storage in HDFS.


was (Author: miklos.szeg...@cloudera.com):
Attaching my suggestion how to solve this. The code streams HDFS as standard 
input to the tar and gzip commands. It handles Windows as well. As an addition 
I create temporary files with permissions 700 instead of 755. I do not create 
any additional temporary directories for extraction, one is enough. A 
difference is that I use jar command for zips as well, so that it handles 
Windows properly. Also I added an additional switch to be able to disable the 
modification time check specifying -1 as the timestamp.

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-2185) Use pipes when localizing archives

2017-12-22 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-2185:
-
Attachment: YARN-2185.000.patch

Attaching my suggestion how to solve this. The code streams HDFS as standard 
input to the tar and gzip commands. It handles Windows as well. As an addition 
I create temporary files with permissions 700 instead of 755. I do not create 
any additional temporary directories for extraction, one is enough. A 
difference is that I use jar command for zips as well, so that it handles 
Windows properly. Also I added an additional switch to be able to disable the 
modification time check specifying -1 as the timestamp.

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
> Attachments: YARN-2185.000.patch
>
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7590) Improve container-executor validation check

2017-12-21 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300387#comment-16300387
 ] 

Miklos Szegedi commented on YARN-7590:
--

Thank you for the patch, [~eyang]. I have a few minor comments left.
{code}
717 int caller_uid = 0;
{code}
Just in case I would have an invalid init value like -1.
{code}
712   int result = check_nm_local_dir(caller_uid, container_log_dir);
713   if (result != 0) {
714 container_log_dir = NULL;
715   }
...
1056int result = check_nm_local_dir(caller_uid, *log_root);
1057if (result != 0) {
1058  app_log_dir = NULL;
1059}
{code}
I am missing here a useful comment like below. You may also want to mention the 
faulting directory.
{code}
fprintf(LOGFILE, "Permission mismatch for %s for uid: %d.\n", nm_root, 
caller_uid);
{code}
Even better a log in check_nm_local_dir in case of failure would help a lot to 
diagnose problems.
{code}
531 int check_nm_local_dir(int caller_uid, const char *nm_root) {
532   struct stat info;
533   stat(nm_root, &info);
534   if (caller_uid != info.st_uid) {
535 return 1;
536   }
537   return 0;
538 }
{code}
There is no error check on the stat call.
{code}
711   char *container_log_dir = get_app_log_directory(*log_dir_ptr, 
combined_name);
712   int result = check_nm_local_dir(caller_uid, container_log_dir);
713   if (result != 0) {
714 container_log_dir = NULL;
715   }
{code}
{{create_container_directories()}} needs to check for {{log_dir_ptr}} not 
{{container_log_dir}} that does not exist, yet.
Also a note. If the check succeeds, we do an mkdirs() that walks up the stack 
and may create parent directories. It may be good to put the check into mkdirs 
as well (or only there), when we need to create a directory.


> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7590.001.patch, YARN-7590.002.patch, 
> YARN-7590.003.patch, YARN-7590.004.patch
>
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is owned by the same user as the caller to 
> container-executor.
> # Make sure the log directory prefix is owned by the same user as the caller.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2185) Use pipes when localizing archives

2017-12-20 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299199#comment-16299199
 ] 

Miklos Szegedi commented on YARN-2185:
--

Based on our discussion offline I can spend a few cycles on this.

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-2185) Use pipes when localizing archives

2017-12-20 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi reassigned YARN-2185:


Assignee: Miklos Szegedi

> Use pipes when localizing archives
> --
>
> Key: YARN-2185
> URL: https://issues.apache.org/jira/browse/YARN-2185
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Miklos Szegedi
>
> Currently the nodemanager downloads an archive to a local file, unpacks it, 
> and then removes it.  It would be more efficient to stream the data as it's 
> being unpacked to avoid both the extra disk space requirements and the 
> additional disk activity from storing the archive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7590) Improve container-executor validation check

2017-12-20 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16298990#comment-16298990
 ] 

Miklos Szegedi commented on YARN-7590:
--

Thank you for the patch, [~eyang].
I see two more issues.
{{uid}} could just be a global variable saving some code but using locals is 
fine. However, we have now a caller uid, a yarn uid and a run as uid. Please 
rename the uid you created as you pass along the functions as caller_uid.
Also, the patch does not apply to the scenario in the initial description. 
Please do the check in {{create_log_dirs}} as well.

> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7590.001.patch, YARN-7590.002.patch
>
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is same as the one in yarn-site.xml, and 
> yarn-site.xml is owned by root, 644, and marked as final in property.
> # Make sure the user path is not a symlink, usercache is not a symlink.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-12-18 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16295326#comment-16295326
 ] 

Miklos Szegedi commented on YARN-7577:
--

The other unit test errors should be unrelated. I only changed this single test.

> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch, YARN-7577.001.patch, 
> YARN-7577.002.patch, YARN-7577.003.patch, YARN-7577.004.patch, 
> YARN-7577.005.patch, YARN-7577.006.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers
> {code}
> java.lang.AssertionError: 
> Expected :-102
> Actual   :-106
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-12-12 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7577:
-
Attachment: YARN-7577.006.patch

Fixing checkstyle

> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch, YARN-7577.001.patch, 
> YARN-7577.002.patch, YARN-7577.003.patch, YARN-7577.004.patch, 
> YARN-7577.005.patch, YARN-7577.006.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers
> {code}
> java.lang.AssertionError: 
> Expected :-102
> Actual   :-106
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7590) Improve container-executor validation check

2017-12-12 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288599#comment-16288599
 ] 

Miklos Szegedi commented on YARN-7590:
--

[~eyang], the first line of {{main()}} calls {{assert_valid_setup()}} that 
calls {{setuid(0)}}. You need to sample the yarn uid with {{getuid()}} and 
store before this call to avoid the following error:
{code}
515 uid 2002 gid 2002 euid 0 egid 2002
517 uid 0 gid 2002 euid 0 egid 2002
main : command provided 0
main : run as user is nobody
main : requested yarn user is foo
521 uid 0 gid 2002 euid 0 egid 2002
556 uid 0 gid 2002 euid 0 egid 2002
uid 0 gid 2002 euid 0 egid 2002
558 uid 0 gid 2002 euid 99 egid 99
Permission mismatch for /tmp/hadoop-foo/nm-local-dir for uid: 0.
{code}


> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7590.001.patch
>
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is same as the one in yarn-site.xml, and 
> yarn-site.xml is owned by root, 644, and marked as final in property.
> # Make sure the user path is not a symlink, usercache is not a symlink.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-12-11 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7577:
-
Attachment: YARN-7577.005.patch

> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch, YARN-7577.001.patch, 
> YARN-7577.002.patch, YARN-7577.003.patch, YARN-7577.004.patch, 
> YARN-7577.005.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers
> {code}
> java.lang.AssertionError: 
> Expected :-102
> Actual   :-106
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-12-11 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7577:
-
Attachment: YARN-7577.004.patch

> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch, YARN-7577.001.patch, 
> YARN-7577.002.patch, YARN-7577.003.patch, YARN-7577.004.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers
> {code}
> java.lang.AssertionError: 
> Expected :-102
> Actual   :-106
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7064) Use cgroup to get container resource utilization

2017-12-11 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286461#comment-16286461
 ] 

Miklos Szegedi commented on YARN-7064:
--

The unit test error is YARN-7629. TestRaceWhenRelogin does not repro to me. I 
did not change any dependencies.

> Use cgroup to get container resource utilization
> 
>
> Key: YARN-7064
> URL: https://issues.apache.org/jira/browse/YARN-7064
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7064.000.patch, YARN-7064.001.patch, 
> YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch, 
> YARN-7064.005.patch, YARN-7064.007.patch, YARN-7064.008.patch, 
> YARN-7064.009.patch
>
>
> This is an addendum to YARN-6668. What happens is that that jira always wants 
> to rebase patches against YARN-1011 instead of trunk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7590) Improve container-executor validation check

2017-12-08 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284372#comment-16284372
 ] 

Miklos Szegedi commented on YARN-7590:
--

[~eyang], sorry about the delay. Due to the sensitivity of the issue I intend 
to do some end to end tests but I did not get there yet.

> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>Assignee: Eric Yang
> Attachments: YARN-7590.001.patch
>
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is same as the one in yarn-site.xml, and 
> yarn-site.xml is owned by root, 644, and marked as final in property.
> # Make sure the user path is not a symlink, usercache is not a symlink.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7064) Use cgroup to get container resource utilization

2017-12-08 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7064:
-
Attachment: YARN-7064.009.patch

Fixing checkstyle and unit tests

> Use cgroup to get container resource utilization
> 
>
> Key: YARN-7064
> URL: https://issues.apache.org/jira/browse/YARN-7064
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7064.000.patch, YARN-7064.001.patch, 
> YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch, 
> YARN-7064.005.patch, YARN-7064.007.patch, YARN-7064.008.patch, 
> YARN-7064.009.patch
>
>
> This is an addendum to YARN-6668. What happens is that that jira always wants 
> to rebase patches against YARN-1011 instead of trunk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-12-06 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280657#comment-16280657
 ] 

Miklos Szegedi commented on YARN-7577:
--

Thank you for the review [~rkanter]. I updated the patch.


> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch, YARN-7577.001.patch, 
> YARN-7577.002.patch, YARN-7577.003.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers
> {code}
> java.lang.AssertionError: 
> Expected :-102
> Actual   :-106
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-12-06 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7577:
-
Attachment: YARN-7577.003.patch

> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch, YARN-7577.001.patch, 
> YARN-7577.002.patch, YARN-7577.003.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers
> {code}
> java.lang.AssertionError: 
> Expected :-102
> Actual   :-106
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7064) Use cgroup to get container resource utilization

2017-12-06 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7064:
-
Attachment: YARN-7064.008.patch

> Use cgroup to get container resource utilization
> 
>
> Key: YARN-7064
> URL: https://issues.apache.org/jira/browse/YARN-7064
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7064.000.patch, YARN-7064.001.patch, 
> YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch, 
> YARN-7064.005.patch, YARN-7064.007.patch, YARN-7064.008.patch
>
>
> This is an addendum to YARN-6668. What happens is that that jira always wants 
> to rebase patches against YARN-1011 instead of trunk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7590) Improve container-executor validation check

2017-12-01 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275315#comment-16275315
 ] 

Miklos Szegedi commented on YARN-7590:
--

[~eyang], why do not we just call getuid() to get the uid?

> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Affects Versions: 2.0.1-alpha, 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0, 2.7.0, 
> 2.8.0, 2.8.1, 3.0.0-beta1
>Reporter: Eric Yang
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is same as the one in yarn-site.xml, and 
> yarn-site.xml is owned by root, 644, and marked as final in property.
> # Make sure the user path is not a symlink, usercache is not a symlink.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7590) Improve container-executor validation check

2017-12-01 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275142#comment-16275142
 ] 

Miklos Szegedi commented on YARN-7590:
--

I have two more options:
3. Instead of getting a prefix path from container-executor.cfg and/or 
yarn-site.xml you could check, if yarn has permissions to the desired path and 
all its parents. There is no need to check either of the config files in this 
case, so this would be the simplest change.
4. Disallow disruptive changes: check, if container-executor is about to chmod 
an existing directory with incompatible permissions and disallow it.

I am in favor of 2. or 3.

There are multiple reasons why currently it is not a good idea to call out to 
yarn-site.xml from container-executor (Option 1.):
1. XML parsing may add yet another library that increases the attack surface
2. You need to make sure (--checksetup) that the XML has the right permissions
3. CLASSPATH is not inherited, so it may pick up a different yarn-site.xml than 
what the node manager uses
4. Potentially breaking change: requiring yarn-site.xml parents writable only 
by root
5. Potentially breaking change: non-root users can no longer modify 
yarn-site.xml settings
I am all in favor of simple configuration provided by option 1., but at this 
time I would suggest having a separate config line in container-executor.cfg 
(option 2.) or option 3.. A future compatibility breaking JIRA can merge the 
two config files properly implementing proper rights checks. container-executor 
could give a proper error message in case of option 2., so that the admin can 
update the directories in case of a failure.


> Improve container-executor validation check
> ---
>
> Key: YARN-7590
> URL: https://issues.apache.org/jira/browse/YARN-7590
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: security, yarn
>Reporter: Eric Yang
>
> There is minimum check for prefix path for container-executor.  If YARN is 
> compromised, attacker  can use container-executor to change system files 
> ownership:
> {code}
> /usr/local/hadoop/bin/container-executor spark yarn 0 etc /home/yarn/tokens 
> /home/spark / ls
> {code}
> This will change /etc to be owned by spark user:
> {code}
> # ls -ld /etc
> drwxr-s---. 110 spark hadoop 8192 Nov 21 20:00 /etc
> {code}
> Spark user can rewrite /etc files to gain more access.  We can improve this 
> with additional check in container-executor:
> # Make sure the prefix path is same as the one in yarn-site.xml, and 
> yarn-site.xml is owned by root, 644, and marked as final in property.
> # Make sure the user path is not a symlink, usercache is not a symlink.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-11-29 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271256#comment-16271256
 ] 

Miklos Szegedi commented on YARN-7577:
--

The unit test issues are unrelated.

> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch, YARN-7577.001.patch, 
> YARN-7577.002.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers
> {code}
> java.lang.AssertionError: 
> Expected :-102
> Actual   :-106
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-11-28 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7577:
-
Attachment: YARN-7577.002.patch

> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch, YARN-7577.001.patch, 
> YARN-7577.002.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers
> {code}
> java.lang.AssertionError: 
> Expected :-102
> Actual   :-106
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-11-28 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7577:
-
Attachment: YARN-7577.001.patch

> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch, YARN-7577.001.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers
> {code}
> java.lang.AssertionError: 
> Expected :-102
> Actual   :-106
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-11-28 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7577:
-
Description: 
This happens, if Fair Scheduler is the default. The test should run with both 
schedulers
{code}
java.lang.AssertionError: 
Expected :-102
Actual   :-106
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

  was:This happens, if Fair Scheduler is the default. The test should run with 
both schedulers


> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers
> {code}
> java.lang.AssertionError: 
> Expected :-102
> Actual   :-106
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testPreemptedAMRestartOnRMRestart(TestAMRestart.java:583)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-11-28 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7577:
-
Attachment: YARN-7577.000.patch

> Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
> --
>
> Key: YARN-7577
> URL: https://issues.apache.org/jira/browse/YARN-7577
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7577.000.patch
>
>
> This happens, if Fair Scheduler is the default. The test should run with both 
> schedulers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7491) Make sure AM is not scheduled on an opportunistic container

2017-11-28 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269490#comment-16269490
 ] 

Miklos Szegedi commented on YARN-7491:
--

+1 Thank you for the contribution [~haibochen]. Committing this shortly.

> Make sure AM is not scheduled on an opportunistic container
> ---
>
> Key: YARN-7491
> URL: https://issues.apache.org/jira/browse/YARN-7491
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7491-YARN-1011.00.patch, 
> YARN-7491-YARN-1011.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-11-28 Thread Miklos Szegedi (JIRA)

Miklos Szegedi created YARN-7577:


 Summary: Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
 Key: YARN-7577
 URL: https://issues.apache.org/jira/browse/YARN-7577
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


This happens, if Fair Scheduler is the default. The test should run with both 
schedulers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7491) Make sure AM is not scheduled on an opportunistic container

2017-11-28 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269285#comment-16269285
 ] 

Miklos Szegedi commented on YARN-7491:
--

We discussed this in person. So the issue is that there is no unit test 
newAMResourceRequest and probably it needs to be based on newResourceRequest to 
group the common code together. Also, please address the checkstyle issue.

> Make sure AM is not scheduled on an opportunistic container
> ---
>
> Key: YARN-7491
> URL: https://issues.apache.org/jira/browse/YARN-7491
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7491-YARN-1011.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7491) Make sure AM is not scheduled on an opportunistic container

2017-11-27 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267920#comment-16267920
 ] 

Miklos Szegedi commented on YARN-7491:
--

I reverted the AM request line in the patch to the following:
{code}
  amReqs = Collections.singletonList(BuilderUtils
  .newResourceRequest(RMAppAttemptImpl.AM_CONTAINER_PRIORITY,
  ResourceRequest.ANY, submissionContext.getResource(), 1));
{code}
TestAppManager still succeeded. That means that the change is not covered by 
the unit tests to me.

> Make sure AM is not scheduled on an opportunistic container
> ---
>
> Key: YARN-7491
> URL: https://issues.apache.org/jira/browse/YARN-7491
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7491-YARN-1011.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7491) Make sure AM is not scheduled on an opportunistic container

2017-11-27 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267702#comment-16267702
 ] 

Miklos Szegedi commented on YARN-7491:
--

Thank you for the patch [~haibochen]. Even though there are changes in unit 
tests, so Jenkins does not complain, I think we should have a unit test for the 
change.

> Make sure AM is not scheduled on an opportunistic container
> ---
>
> Key: YARN-7491
> URL: https://issues.apache.org/jira/browse/YARN-7491
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7491-YARN-1011.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7363) ContainerLocalizer don't have a valid log4j config in case of Linux container executor

2017-11-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263329#comment-16263329
 ] 

Miklos Szegedi commented on YARN-7363:
--

+1 (non-binding). Thank you for the patch [~yufeigu].

> ContainerLocalizer don't have a valid log4j config in case of Linux container 
> executor
> --
>
> Key: YARN-7363
> URL: https://issues.apache.org/jira/browse/YARN-7363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7363.001.patch, YARN-7363.002.patch, 
> YARN-7363.003.patch, YARN-7363.004.patch
>
>
> In case of Linux container executor, ContainerLocalizer run as a separated 
> process. It doesn't access a valid log4j.properties when the application user 
> is not in the "hadoop" group. The log4j.properties of node manager is in its 
> classpath, but it isn't readable by users not in hadoop group due to the 
> security concern. In that case, ContainerLocalizer doesn't have a valid log4j 
> configuration, and normally no log output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-6921) Allow resource request to opt out of oversubscription in Fair Scheduler

2017-11-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262916#comment-16262916
 ] 

Miklos Szegedi edited comment on YARN-6921 at 11/22/17 6:22 PM:


Thank you for the contribution, [~haibochen]. I will commit this shortly.


was (Author: miklos.szeg...@cloudera.com):
+1 Thank you for the contribution, [~haibochen]. I will commit this shortly.

> Allow resource request to opt out of oversubscription in Fair Scheduler
> ---
>
> Key: YARN-6921
> URL: https://issues.apache.org/jira/browse/YARN-6921
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6921-YARN-1011.00.patch, 
> YARN-6921-YARN-1011.01.patch
>
>
> Guaranteed container requests, enforce tag true or not, are by default 
> eligible for oversubscription, and thus can get OPPORTUNISTIC container 
> allocations. We should allow them to opt out if their enforce tag is set to 
> true.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-6921) Allow resource request to opt out of oversubscription in Fair Scheduler

2017-11-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262916#comment-16262916
 ] 

Miklos Szegedi edited comment on YARN-6921 at 11/22/17 6:21 PM:


+1 Thank you for the contribution, [~haibochen]. I will commit this shortly.


was (Author: miklos.szeg...@cloudera.com):
Thank you for the contribution, [~haibochen]. I will commit this shortly.

> Allow resource request to opt out of oversubscription in Fair Scheduler
> ---
>
> Key: YARN-6921
> URL: https://issues.apache.org/jira/browse/YARN-6921
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6921-YARN-1011.00.patch, 
> YARN-6921-YARN-1011.01.patch
>
>
> Guaranteed container requests, enforce tag true or not, are by default 
> eligible for oversubscription, and thus can get OPPORTUNISTIC container 
> allocations. We should allow them to opt out if their enforce tag is set to 
> true.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6750) Add a configuration to cap how much a NM can be overallocated

2017-11-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262941#comment-16262941
 ] 

Miklos Szegedi commented on YARN-6750:
--

+1 Thank you for the contribution [~haibochen]. I prefer fixing all checkstyle 
bugs but this one is not a big one indeed. I will commit this shortly.

> Add a configuration to cap how much a NM can be overallocated
> -
>
> Key: YARN-6750
> URL: https://issues.apache.org/jira/browse/YARN-6750
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6750-YARN-1011.00.patch, 
> YARN-6750-YARN-1011.01.patch, YARN-6750-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6921) Allow resource request to opt out of oversubscription in Fair Scheduler

2017-11-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262916#comment-16262916
 ] 

Miklos Szegedi commented on YARN-6921:
--

Thank you for the contribution, [~haibochen]. I will commit this shortly.

> Allow resource request to opt out of oversubscription in Fair Scheduler
> ---
>
> Key: YARN-6921
> URL: https://issues.apache.org/jira/browse/YARN-6921
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6921-YARN-1011.00.patch, 
> YARN-6921-YARN-1011.01.patch
>
>
> Guaranteed container requests, enforce tag true or not, are by default 
> eligible for oversubscription, and thus can get OPPORTUNISTIC container 
> allocations. We should allow them to opt out if their enforce tag is set to 
> true.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7337) Expose per-node over-allocation info in Node Report

2017-11-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262892#comment-16262892
 ] 

Miklos Szegedi commented on YARN-7337:
--

+1. Thank you for the contribution [~haibochen]. I will commit this shortly.

> Expose per-node over-allocation info in Node Report
> ---
>
> Key: YARN-7337
> URL: https://issues.apache.org/jira/browse/YARN-7337
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7337-YARN-1011.00.patch, 
> YARN-7337-YARN-1011.01.patch, YARN-7337-YARN-1011.02.patch, 
> YARN-7337-YARN-1011.03.patch, YARN-7337-YARN-1011.04.patch, 
> YARN-7337-YARN-1011.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7337) Expose per-node over-allocation info in Node Report

2017-11-22 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259899#comment-16259899
 ] 

Miklos Szegedi edited comment on YARN-7337 at 11/22/17 4:33 PM:


Thank you for the patch [~haibochen]
{code}
153   @Public
154   @Unstable
155   public abstract Resource getGuaranteedResourceUsed();
156 
...
160 
161   /**
162* Get opportunistic Resource used on the node.
163* @return opportunistic Resource used on the 
node
164*/
165   @Public
166   @Stable
167   public abstract Resource getOpportunisticResourceUsed();
{code}
The stability annotation usage is inconsistent in case of these two functions.
{code}
45/**
46 * @return the amount of resources currently used by the node.
47 */
{code}
This should mention amount of guaranteed resources.


was (Author: miklos.szeg...@cloudera.com):
Thenk you for the patch [~haibochen]
{code}
153   @Public
154   @Unstable
155   public abstract Resource getGuaranteedResourceUsed();
156 
...
160 
161   /**
162* Get opportunistic Resource used on the node.
163* @return opportunistic Resource used on the 
node
164*/
165   @Public
166   @Stable
167   public abstract Resource getOpportunisticResourceUsed();
{code}
The stability annotation usage is inconsistent in case of these two functions.
{code}
45/**
46 * @return the amount of resources currently used by the node.
47 */
{code}
This should mention amount of guaranteed resources.

> Expose per-node over-allocation info in Node Report
> ---
>
> Key: YARN-7337
> URL: https://issues.apache.org/jira/browse/YARN-7337
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7337-YARN-1011.00.patch, 
> YARN-7337-YARN-1011.01.patch, YARN-7337-YARN-1011.02.patch, 
> YARN-7337-YARN-1011.03.patch, YARN-7337-YARN-1011.04.patch, 
> YARN-7337-YARN-1011.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5534) Allow user provided Docker volume mount list

2017-11-21 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261921#comment-16261921
 ] 

Miklos Szegedi commented on YARN-5534:
--

Thank you, [~shaneku...@gmail.com] for the patch.
{code}
165   private static final Pattern USER_MOUNT_PATTERN = Pattern.compile(
166   "(?<=^|,)([\\s/.a-zA-Z0-9_-]+):([\\s/.a-zA-Z0-9_-]+):([a-z]+)");
{code}
This pattern excludes characters considered valid and supported by Linux in 
several languages. Please consider a pattern excluding the known reserved 
characters as described here: 
https://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words


> Allow user provided Docker volume mount list
> 
>
> Key: YARN-5534
> URL: https://issues.apache.org/jira/browse/YARN-5534
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: luhuichun
>Assignee: Shane Kumpf
> Attachments: YARN-5534.001.patch, YARN-5534.002.patch, 
> YARN-5534.003.patch, YARN-5534.004.patch, YARN-5534.005.patch, 
> YARN-5534.006.patch
>
>
> YARN-6623 added support in container-executor for admin supplied Docker 
> volume whitelists. This allows controlling which host directories can be 
> mounted into Docker containers launched by YARN. A read-only and read-write 
> whitelist was added. We now need the ability for users to supply the mounts 
> they require for their application, which will be validated against the admin 
> whitelist in container-executor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7554) TestCryptoStreamsWithOpensslAesCtrCryptoCodec fails on Debian 9

2017-11-21 Thread Miklos Szegedi (JIRA)

Miklos Szegedi created YARN-7554:


 Summary: TestCryptoStreamsWithOpensslAesCtrCryptoCodec fails on 
Debian 9
 Key: YARN-7554
 URL: https://issues.apache.org/jira/browse/YARN-7554
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


{code}
[ERROR] org.apache.hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec  
Time elapsed: 0.478 s  <<< FAILURE!
java.lang.AssertionError: Unable to instantiate codec 
org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, is the required version of 
OpenSSL installed?
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:621)
at 
org.apache.hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec.init(TestCryptoStreamsWithOpensslAesCtrCryptoCodec.java:43)
{code}
This happened due to the following openssl change:
https://github.com/openssl/openssl/commit/ff4b7fafb315df5f8374e9b50c302460e068f188



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7553) TestShellBasedUnixGroupsMapping.testFiniteGroupResolutionTime flaky

2017-11-21 Thread Miklos Szegedi (JIRA)

Miklos Szegedi created YARN-7553:


 Summary: 
TestShellBasedUnixGroupsMapping.testFiniteGroupResolutionTime flaky
 Key: YARN-7553
 URL: https://issues.apache.org/jira/browse/YARN-7553
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


{code}
[ERROR] 
testFiniteGroupResolutionTime(org.apache.hadoop.security.TestShellBasedUnixGroupsMapping)
  Time elapsed: 61.975 s  <<< FAILURE!
java.lang.AssertionError: 
Expected the logs to carry a message about command timeout but was: 2017-11-22 
00:10:57,523 WARN  security.ShellBasedUnixGroupsMapping 
(ShellBasedUnixGroupsMapping.java:getUnixGroups(181)) - unable to return groups 
for user foobarnonexistinguser
PartialGroupNameException The user name 'foobarnonexistinguser' is not found. 
at 
org.apache.hadoop.security.ShellBasedUnixGroupsMapping.resolvePartialGroupNames(ShellBasedUnixGroupsMapping.java:275)
at 
org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:178)
at 
org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:97)
at 
org.apache.hadoop.security.TestShellBasedUnixGroupsMapping.testFiniteGroupResolutionTime(TestShellBasedUnixGroupsMapping.java:278)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7363) ContainerLocalizer don't have a valid log4j config in case of Linux container executor

2017-11-21 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261650#comment-16261650
 ] 

Miklos Szegedi commented on YARN-7363:
--

Thank you, [~yufeigu] for the patch.
{code}
488 // create the log dir
{code}
Optional. I think this one is not necessary.
{code}
286 // replace the "" with the real container log directory
{code}
Optional. This one would normally go to a javadoc before the function header.
{code}
289 Path containerLogPath = new Path("/tmp");
{code}
Instead of putting the logs into the temp path, which may be insecure, I would 
rather not log, if the configured log path is not accessible. Even if you fall 
back to /tmp, it would be better to include this fact in the error message of 
the exception path.
{code}
296 for (String item : command) {
297   newCmds.add(ContainerLaunch.expandEnvironment(item, 
containerLogPath));
298 }
{code}
I think replaceWithContainerLogDir should do what it says to do and it should 
not expand the environment variables of other parameters. It might happen that 
the environment required is added after this function.
{code}
415 addLog4jSystemProperties("INFO", command);
{code}
It would probably be useful to make this configurable instead of hardcoding.


> ContainerLocalizer don't have a valid log4j config in case of Linux container 
> executor
> --
>
> Key: YARN-7363
> URL: https://issues.apache.org/jira/browse/YARN-7363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7363.001.patch, YARN-7363.002.patch
>
>
> In case of Linux container executor, ContainerLocalizer run as a separated 
> process. It doesn't access a valid log4j.properties when the application user 
> is not in the "hadoop" group. The log4j.properties of node manager is in its 
> classpath, but it isn't readable by users not in hadoop group due to the 
> security concern. In that case, ContainerLocalizer doesn't have a valid log4j 
> configuration, and normally no log output.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6750) Add a configuration to cap how much a NM can be overallocated

2017-11-20 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259942#comment-16259942
 ] 

Miklos Szegedi commented on YARN-6750:
--

Thank you for the patch [~haibochen].
{code}
99  
100   public SchedulerNode(RMNode node, boolean usePortForNodeName,
{code}
There is an extra space here.
All references to 4.0 in the patch should be replaced with 
YarnConfiguration.DEFAULT_PER_NODE_MAX_OVERALLOCATION_RATIO.

> Add a configuration to cap how much a NM can be overallocated
> -
>
> Key: YARN-6750
> URL: https://issues.apache.org/jira/browse/YARN-6750
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6750-YARN-1011.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6921) Allow resource request to opt out of oversubscription in Fair Scheduler

2017-11-20 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259934#comment-16259934
 ] 

Miklos Szegedi commented on YARN-6921:
--

+1 pending jenkins. Thank you for the patch [~haibochen]

> Allow resource request to opt out of oversubscription in Fair Scheduler
> ---
>
> Key: YARN-6921
> URL: https://issues.apache.org/jira/browse/YARN-6921
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-6921-YARN-1011.00.patch
>
>
> Guaranteed container requests, enforce tag true or not, are by default 
> eligible for oversubscription, and thus can get OPPORTUNISTIC container 
> allocations. We should allow them to opt out if their enforce tag is set to 
> true.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7337) Expose per-node over-allocation info in Node Report

2017-11-20 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259899#comment-16259899
 ] 

Miklos Szegedi edited comment on YARN-7337 at 11/20/17 9:50 PM:


Thenk you for the patch [~haibochen]
{code}
153   @Public
154   @Unstable
155   public abstract Resource getGuaranteedResourceUsed();
156 
...
160 
161   /**
162* Get opportunistic Resource used on the node.
163* @return opportunistic Resource used on the 
node
164*/
165   @Public
166   @Stable
167   public abstract Resource getOpportunisticResourceUsed();
{code}
The stability annotation usage is inconsistent in case of these two functions.
{code}
45/**
46 * @return the amount of resources currently used by the node.
47 */
{code}
This should mention amount of guaranteed resources.


was (Author: miklos.szeg...@cloudera.com):
Thenk you for the patch [~haibochen]
{code}
153   @Public
154   @Unstable
155   public abstract Resource getGuaranteedResourceUsed();
156 
...
160 
161   /**
162* Get opportunistic Resource used on the node.
163* @return opportunistic Resource used on the 
node
164*/
165   @Public
166   @Stable
167   public abstract Resource getOpportunisticResourceUsed();
{code}
The stability annotation usage is inconsistent in case of these two functions.

> Expose per-node over-allocation info in Node Report
> ---
>
> Key: YARN-7337
> URL: https://issues.apache.org/jira/browse/YARN-7337
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7337-YARN-1011.00.patch, 
> YARN-7337-YARN-1011.01.patch, YARN-7337-YARN-1011.02.patch, 
> YARN-7337-YARN-1011.03.patch, YARN-7337-YARN-1011.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7337) Expose per-node over-allocation info in Node Report

2017-11-20 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259899#comment-16259899
 ] 

Miklos Szegedi commented on YARN-7337:
--

Thenk you for the patch [~haibochen]
{code}
153   @Public
154   @Unstable
155   public abstract Resource getGuaranteedResourceUsed();
156 
...
160 
161   /**
162* Get opportunistic Resource used on the node.
163* @return opportunistic Resource used on the 
node
164*/
165   @Public
166   @Stable
167   public abstract Resource getOpportunisticResourceUsed();
{code}
The stability annotation usage is inconsistent in case of these two functions.

> Expose per-node over-allocation info in Node Report
> ---
>
> Key: YARN-7337
> URL: https://issues.apache.org/jira/browse/YARN-7337
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7337-YARN-1011.00.patch, 
> YARN-7337-YARN-1011.01.patch, YARN-7337-YARN-1011.02.patch, 
> YARN-7337-YARN-1011.03.patch, YARN-7337-YARN-1011.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-20 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259740#comment-16259740
 ] 

Miklos Szegedi commented on YARN-7506:
--

The startup time might prevent us to use a root java process. The question is 
the CLI. What are the reasons it is better than a long running root Java 
process listening to a Unix socket accessible by yarn only? It does parameter 
checking, but does not the docker daemon do it anyway? CLI is slower to start 
up, it has all the risks with environment, shell, etc.

> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
> Attachments: YARN-Docker control options.pdf
>
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file container-executor.cfg writable by root 
> only that specifies what actions are allowed for the yarn user. However, the 
> requirements have changed with Docker and that raises the following questions:
> 1. The Docker feature of YARN requires root permissions to *access the Docker 
> socket* but it does not run any system calls, so could the Docker related 
> code in container-executor be *refactored into a separate Java process ran as 
> root*? Java would make the development much faster and more secure. 
> 2. The Docker feature only needs the Docker unix socket. It is not a good 
> idea to let the yarn user directly access the socket, since that would 
> elevate its privileges to root. However, the Java tool running as root 
> mentioned in the previous question could act as a *proxy on the Docker 
> socket* operating directly on the Docker REST API *eliminating the need to 
> use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5673) [Umbrella] Re-write container-executor to improve security, extensibility, and portability

2017-11-17 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257394#comment-16257394
 ] 

Miklos Szegedi commented on YARN-5673:
--

[~eyang], thank you for the explanation. Indeed, these might need root 
privileges.


> [Umbrella] Re-write container-executor to improve security, extensibility, 
> and portability
> --
>
> Key: YARN-5673
> URL: https://issues.apache.org/jira/browse/YARN-5673
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: container-executor Re-write Design Document.pdf
>
>
> As YARN adds support for new features that require administrator 
> privileges(such as support for network throttling and docker), we’ve had to 
> add new capabilities to the container-executor. This has led to a recognition 
> that the current container-executor security features as well as the code 
> could be improved. The current code is fragile and it’s hard to add new 
> features without causing regressions. Some of the improvements that need to 
> be made are -
> *Security*
> Currently the container-executor has limited security features. It relies 
> primarily on the permissions set on the binary but does little additional 
> security beyond that. There are few outstanding issues today -
> - No audit log
> - No way to disable features - network throttling and docker support are 
> built in and there’s no way to turn them off at a container-executor level
> - Code can be improved - a lot of the code switches users back and forth in 
> an arbitrary manner
> - No input validation - the paths, and files provided at invocation are not 
> validated or required to be in some specific location
> - No signing functionality - there is no way to enforce that the binary was 
> invoked by the NM and not by any other process
> *Code Issues*
> The code layout and implementation themselves can be improved. Some issues 
> there are -
> - No support for log levels - everything is logged and this can’t be turned 
> on or off
> - Extremely long set of invocation parameters(specifically during container 
> launch) which makes turning features on or off complicated
> - Poor test coverage - it’s easy to introduce regressions today due to the 
> lack of a proper test setup
> - Duplicate functionality - there is some amount of code duplication
> - Hard to make improvements or add new features due to the issues raised above
> *Portability*
>  - The container-executor mixes platform dependent APIs with platform 
> independent APIs making it hard to run it on multiple platforms. Allowing it 
> to run on multiple platforms also improves the overall code structure .
> One option is to improve the existing container-executor, however it might be 
> easier to start from scratch. That allows existing functionality to be 
> supported until we are ready to switch to the new code.
> This umbrella JIRA is to capture all the work required for the new code. I'm 
> going to work on a design doc for the changes - any suggestions or 
> improvements are welcome.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-17 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7506:
-
Issue Type: Sub-task  (was: Wish)
Parent: YARN-5673

> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
> Attachments: YARN-Docker control options.pdf
>
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file container-executor.cfg writable by root 
> only that specifies what actions are allowed for the yarn user. However, the 
> requirements have changed with Docker and that raises the following questions:
> 1. The Docker feature of YARN requires root permissions to *access the Docker 
> socket* but it does not run any system calls, so could the Docker related 
> code in container-executor be *refactored into a separate Java process ran as 
> root*? Java would make the development much faster and more secure. 
> 2. The Docker feature only needs the Docker unix socket. It is not a good 
> idea to let the yarn user directly access the socket, since that would 
> elevate its privileges to root. However, the Java tool running as root 
> mentioned in the previous question could act as a *proxy on the Docker 
> socket* operating directly on the Docker REST API *eliminating the need to 
> use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-17 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7506:
-
Attachment: (was: YARN-Docker control options.pdf)

> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
> Attachments: YARN-Docker control options.pdf
>
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file container-executor.cfg writable by root 
> only that specifies what actions are allowed for the yarn user. However, the 
> requirements have changed with Docker and that raises the following questions:
> 1. The Docker feature of YARN requires root permissions to *access the Docker 
> socket* but it does not run any system calls, so could the Docker related 
> code in container-executor be *refactored into a separate Java process ran as 
> root*? Java would make the development much faster and more secure. 
> 2. The Docker feature only needs the Docker unix socket. It is not a good 
> idea to let the yarn user directly access the socket, since that would 
> elevate its privileges to root. However, the Java tool running as root 
> mentioned in the previous question could act as a *proxy on the Docker 
> socket* operating directly on the Docker REST API *eliminating the need to 
> use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-17 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7506:
-
Attachment: YARN-Docker control options.pdf

> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
> Attachments: YARN-Docker control options.pdf
>
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file container-executor.cfg writable by root 
> only that specifies what actions are allowed for the yarn user. However, the 
> requirements have changed with Docker and that raises the following questions:
> 1. The Docker feature of YARN requires root permissions to *access the Docker 
> socket* but it does not run any system calls, so could the Docker related 
> code in container-executor be *refactored into a separate Java process ran as 
> root*? Java would make the development much faster and more secure. 
> 2. The Docker feature only needs the Docker unix socket. It is not a good 
> idea to let the yarn user directly access the socket, since that would 
> elevate its privileges to root. However, the Java tool running as root 
> mentioned in the previous question could act as a *proxy on the Docker 
> socket* operating directly on the Docker REST API *eliminating the need to 
> use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-17 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257375#comment-16257375
 ] 

Miklos Szegedi commented on YARN-7506:
--

Thank you for the comments.
[~ebadger], the main reason a Java root process is more secure than 
container-executor is that it protects against exploitable buffer overflows. 
This is why I raised the suggestion. I was not sure why this approach was not 
followed before, this is why I raised this jira. It is also easier to use for 
most Hadoop developers, as you mentioned.
[~vinodkv], this jira already builds on the experiences of YARN-6623, I would 
rather consider it as a subtask of YARN-5673, if even considered. Now that you 
mentioned, a possible solution for YARN-5673 considering this (YARN-7506) 
suggestion would be to have a root Java based container executor framework that 
loads Java or native C modules. However, Docker has its own unique design with 
the CLI and the socket and no native system call dependencies, that it could be 
handled separately.
bq. Side note: One more important consideration in the container-executor 
design was to not have long running root processes as it may increase the 
attack scope. Assuming that is still intact.
Suggestion 1. above does not require any long running root user process. 2. 
does, however the only surface would be the proxied docker socket and config 
file that is protected with file system permissions just like the 
container-executor executable.
[~eyang]
bq. Both docker and hadoop use "trusted" users...
I have to remind about the rule of defense in depth. In case of defense in 
depth, there is no trusted user. Every input is evil and each component 
(container-executor in this case) has to do its proper error checking.
bq. YARN user tap directly into docker.sock goes against our original 
philosophy of having both "trusted" user and root to perform validation.
Indeed. I agree.
bq. Root power may be used for validation logic when trusted user can not 
validate, such as symlink to local file system access that YARN-6623 solved.
Indeed, and I would mention volume white and blacklists, that the yarn user 
cannot validate because of the defense in depth rule.
bq. We can consider to keep most of logic in Java as long as root privileges is 
not required.
I disagree here. Most of the functionality that YARN-6623 implemented requires 
that root does the validation, so if done in Java, it should be in a Java root 
process.
bq. The performance gain from tapping into docker socket is saving the cost of 
one fork but we would lose a lot of validations done by docker CLI.
The validations are important indeed, but making validations is much more 
difficult on command line options than on easily parseable JSON as the recent 
issues showed.
bq. If it can be helped, calling root cli is preferred than calling root owned 
network socket.
There is a solution for that. We could still use the CLI from Java node manager 
running as yarn on a unix socket writable to yarn that is proxied and security 
filtered with a root java process running in the background and that works on 
the original socket. (See attached diagram)
bq. I don't fully agree with YARN-5673 modules API design. The description is 
another plug-in architecture to enable more functionality with root power. I 
think this is a slippy slope to enable more risks in container-executor.
I agree, I also raised my concerns there.
bq. It is best to avoid running java as root. Java runtime includes a lot of 
third party code, which can be unpredictable with root power.
That is a risk. I would minimize the number of non-JDK dependencies, if java 
root process is chosen. I still think it may be more favorable in this case.

I summarized the options in the attached diagram. That shows which one is the 
most simple.

> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
> Attachments: YARN-Docker control options.pdf
>
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file containe

[jira] [Updated] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-17 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7506:
-
Attachment: YARN-Docker control options.pdf

> Overhaul the design of the Linux container-executor regarding Docker and 
> future runtimes
> 
>
> Key: YARN-7506
> URL: https://issues.apache.org/jira/browse/YARN-7506
> Project: Hadoop YARN
>  Issue Type: Wish
>  Components: nodemanager
>Reporter: Miklos Szegedi
>  Labels: Docker, container-executor
> Attachments: YARN-Docker control options.pdf
>
>
> I raise this topic to discuss a potential improvement of the container 
> executor tool in node manager.
> container-executor has two main purposes. It executes Linux *system calls not 
> available from Java*, and it executes tasks *available to root that are not 
> available to the yarn user*. Historically container-executor did both by 
> doing impersonation. The yarn user is separated from root because it runs 
> network services, so *the yarn user should be restricted* by design. Because 
> of this it has it's own config file container-executor.cfg writable by root 
> only that specifies what actions are allowed for the yarn user. However, the 
> requirements have changed with Docker and that raises the following questions:
> 1. The Docker feature of YARN requires root permissions to *access the Docker 
> socket* but it does not run any system calls, so could the Docker related 
> code in container-executor be *refactored into a separate Java process ran as 
> root*? Java would make the development much faster and more secure. 
> 2. The Docker feature only needs the Docker unix socket. It is not a good 
> idea to let the yarn user directly access the socket, since that would 
> elevate its privileges to root. However, the Java tool running as root 
> mentioned in the previous question could act as a *proxy on the Docker 
> socket* operating directly on the Docker REST API *eliminating the need to 
> use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5673) [Umbrella] Re-write container-executor to improve security, extensibility, and portability

2017-11-17 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257268#comment-16257268
 ] 

Miklos Szegedi commented on YARN-5673:
--

[~eyang], Thank you for your comment. I have a few questions.
Could you elaborate, why does clean up container (4.) require to be written in 
C? Similarly I do not think step 5. mentioned above requires it either. I agree 
that only steps that need root privileges or system calls need to be here 
everything else could go to node manager or a Java process run as root. I even 
have some concerns, that the docker code needs to reside in container executor. 
See YARN-7506 for reference, to discuss.
bq. We should try to contain root power by minimize the workflow and type that 
we support.
I absolutely agree. However, the tool already does this, does not it?

> [Umbrella] Re-write container-executor to improve security, extensibility, 
> and portability
> --
>
> Key: YARN-5673
> URL: https://issues.apache.org/jira/browse/YARN-5673
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: container-executor Re-write Design Document.pdf
>
>
> As YARN adds support for new features that require administrator 
> privileges(such as support for network throttling and docker), we’ve had to 
> add new capabilities to the container-executor. This has led to a recognition 
> that the current container-executor security features as well as the code 
> could be improved. The current code is fragile and it’s hard to add new 
> features without causing regressions. Some of the improvements that need to 
> be made are -
> *Security*
> Currently the container-executor has limited security features. It relies 
> primarily on the permissions set on the binary but does little additional 
> security beyond that. There are few outstanding issues today -
> - No audit log
> - No way to disable features - network throttling and docker support are 
> built in and there’s no way to turn them off at a container-executor level
> - Code can be improved - a lot of the code switches users back and forth in 
> an arbitrary manner
> - No input validation - the paths, and files provided at invocation are not 
> validated or required to be in some specific location
> - No signing functionality - there is no way to enforce that the binary was 
> invoked by the NM and not by any other process
> *Code Issues*
> The code layout and implementation themselves can be improved. Some issues 
> there are -
> - No support for log levels - everything is logged and this can’t be turned 
> on or off
> - Extremely long set of invocation parameters(specifically during container 
> launch) which makes turning features on or off complicated
> - Poor test coverage - it’s easy to introduce regressions today due to the 
> lack of a proper test setup
> - Duplicate functionality - there is some amount of code duplication
> - Hard to make improvements or add new features due to the issues raised above
> *Portability*
>  - The container-executor mixes platform dependent APIs with platform 
> independent APIs making it hard to run it on multiple platforms. Allowing it 
> to run on multiple platforms also improves the overall code structure .
> One option is to improve the existing container-executor, however it might be 
> easier to start from scratch. That allows existing functionality to be 
> supported until we are ready to switch to the new code.
> This umbrella JIRA is to capture all the work required for the new code. I'm 
> going to work on a design doc for the changes - any suggestions or 
> improvements are welcome.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7460) Exclude findbugs warnings on SchedulerNode.numGuaranteedContainers and numOpportunisticContainers

2017-11-16 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256192#comment-16256192
 ] 

Miklos Szegedi commented on YARN-7460:
--

+1 (binding). Thank you, for the patch [~haibochen].

> Exclude findbugs warnings on SchedulerNode.numGuaranteedContainers and 
> numOpportunisticContainers
> -
>
> Key: YARN-7460
> URL: https://issues.apache.org/jira/browse/YARN-7460
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7460-YARN-1011.00.patch
>
>
> The findbug warnings as in 
> https://builds.apache.org/job/PreCommit-YARN-Build/18390/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html
>  are bogus. We should exclude it in 
> hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-15 Thread Miklos Szegedi (JIRA)

Miklos Szegedi created YARN-7506:


 Summary: Overhaul the design of the Linux container-executor 
regarding Docker and future runtimes
 Key: YARN-7506
 URL: https://issues.apache.org/jira/browse/YARN-7506
 Project: Hadoop YARN
  Issue Type: Wish
  Components: nodemanager
Reporter: Miklos Szegedi


I raise this topic to discuss a potential improvement of the container executor 
tool in node manager.
container-executor has two main purposes. It executes Linux *system calls not 
available from Java*, and it executes tasks *available to root that are not 
available to the yarn user*. Historically container-executor did both by doing 
impersonation. The yarn user is separated from root because it runs network 
services, so *the yarn user should be restricted* by design. Because of this it 
has it's own config file container-executor.cfg writable by root only that 
specifies what actions are allowed for the yarn user. However, the requirements 
have changed with Docker and that raises the following questions:

1. The Docker feature of YARN requires root permissions to *access the Docker 
socket* but it does not run any system calls, so could the Docker related code 
in container-executor be *refactored into a separate Java process ran as root*? 
Java would make the development much faster and more secure. 

2. The Docker feature only needs the Docker unix socket. It is not a good idea 
to let the yarn user directly access the socket, since that would elevate its 
privileges to root. However, the Java tool running as root mentioned in the 
previous question could act as a *proxy on the Docker socket* operating 
directly on the Docker REST API *eliminating the need to use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-5936) when cpu strict mode is closed, yarn couldn't assure scheduling fairness between containers

2017-11-13 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250021#comment-16250021
 ] 

Miklos Szegedi commented on YARN-5936:
--

Another option for the future is the use of the cgroup pids subsystem on newer 
kernels. The main reason fairness is not enforced in non-strict mode, is that 
it allows the container to run as many threads with the same cgroup and weight 
as needed. You can limit the amount of threads with the pids namespace, so that 
the effective overall container weight becomes weight*pids_limit. The drawback 
of this approach is that it limits multitasking and the number of launcher 
processes. The possible ideal value of pids_limit is the /, so that we do not starve single threaded 
containers.

> when cpu strict mode is closed, yarn couldn't assure scheduling fairness 
> between containers
> ---
>
> Key: YARN-5936
> URL: https://issues.apache.org/jira/browse/YARN-5936
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.1
>Reporter: zhengchenyu
>Priority: Critical
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> When using LinuxContainer, the setting that 
> "yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage" is 
> true could assure scheduling fairness with the cpu bandwith of cgroup. But 
> the cpu bandwidth of cgroup would lead to bad performance in our experience. 
> Without cpu bandwidth of cgroup, cpu.share of cgroup is our only way to 
> assure scheduling fairness, but it is not completely effective. For example, 
> There are two container that have same vcore(means same cpu.share), one 
> container is single-threaded, the other container is multi-thread. the 
> multi-thread will have more CPU time, It's unreasonable!
> Here is my test case, I submit two distributedshell application. And two 
> commmand are below:
> {code}
> hadoop jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> -shell_script ./run.sh  -shell_args 10 -num_containers 1 -container_memory 
> 1024 -container_vcores 1 -master_memory 1024 -master_vcores 1 -priority 10
> hadoop jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> org.apache.hadoop.yarn.applications.distributedshell.Client -jar 
> share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar 
> -shell_script ./run.sh  -shell_args 1  -num_containers 1 -container_memory 
> 1024 -container_vcores 1 -master_memory 1024 -master_vcores 1 -priority 10
> {code}
>  here show the cpu time of the two container:
> {code}
>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> 15448 yarn  20   0 9059592  28336   9180 S 998.7  0.1  24:09.30 java
> 15026 yarn  20   0 9050340  27480   9188 S 100.0  0.1   3:33.97 java
> 13767 yarn  20   0 1799816 381208  18528 S   4.6  1.2   0:30.55 java
>77 root  rt   0   0  0  0 S   0.3  0.0   0:00.74 
> migration/1   
> {code}
> We find the cpu time of Muliti-Thread are ten times than the cpu time of 
> Single-Thread, though the two container have same cpu.share.
> notes:
> run.sh
> {code} 
>   java -cp /home/yarn/loop.jar:$CLASSPATH loop.loop $1
> {code} 
> loop.java
> {code} 
> package loop;
> public class loop {
>   public static void main(String[] args) {
>   // TODO Auto-generated method stub
>   int loop = 1;
>   if(args.length>=1) {
>   System.out.println(args[0]);
>   loop = Integer.parseInt(args[0]);
>   }
>   for(int i=0;i   System.out.println("start thread " + i);
>   new Thread(new Runnable() {
>   @Override
>   public void run() {
>   // TODO Auto-generated method stub
>   int j=0;
>   while(true){j++;}
>   }
>   }).start();
>   }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7337) Expose per-node over-allocation info in Node Report

2017-11-09 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246786#comment-16246786
 ] 

Miklos Szegedi commented on YARN-7337:
--

Why do not we create a getTotalNumContainers, getNumGuaranteedContainers and 
getNumOpportunisticContainers (and the matching ones for the resource) for 
future users to be consistent? Total would return the sum of the other two for 
now and the sum of all three, if a third one is added, so that we do not run 
into a conflicting compatibility and consistency scenario in the future.

> Expose per-node over-allocation info in Node Report
> ---
>
> Key: YARN-7337
> URL: https://issues.apache.org/jira/browse/YARN-7337
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7337-YARN-1011.00.patch, 
> YARN-7337-YARN-1011.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7460) Exclude findbugs warnings on SchedulerNode.numGuaranteedContainers and numOpportunisticContainers

2017-11-09 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246351#comment-16246351
 ] 

Miklos Szegedi commented on YARN-7460:
--

Thank you for the patch [~haibochen]. The callers are synchronized, so there is 
no race condition indeed. However, why do we need the volatile in the first 
place?

> Exclude findbugs warnings on SchedulerNode.numGuaranteedContainers and 
> numOpportunisticContainers
> -
>
> Key: YARN-7460
> URL: https://issues.apache.org/jira/browse/YARN-7460
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-1011
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7460-YARN-1011.00.patch
>
>
> The findbug warnings as in 
> https://builds.apache.org/job/PreCommit-YARN-Build/18390/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html
>  are bogus. We should exclude it in 
> hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7337) Expose per-node over-allocation info in Node Report

2017-11-09 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246308#comment-16246308
 ] 

Miklos Szegedi commented on YARN-7337:
--

Thank you for the patch [~haibochen].
{code}
168* Get the number of allocated guaranteed containers on the 
node.
{code}
I understand that we need to keep backward compatibility. However, naming a 
method getNumContainers but returning the allocated guaranteed ones break 
compatibility as well. I think a better long term approach would be to add 
getNumOpportunisticContainers, getNumGuaranteedContainers and keep the 
getNumContainers returning the sum of the other two.

> Expose per-node over-allocation info in Node Report
> ---
>
> Key: YARN-7337
> URL: https://issues.apache.org/jira/browse/YARN-7337
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7337-YARN-1011.00.patch, 
> YARN-7337-YARN-1011.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-1015) FS should watch node resource utilization and allocate opportunistic containers if appropriate

2017-11-09 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246270#comment-16246270
 ] 

Miklos Szegedi commented on YARN-1015:
--

[~asuresh], do you have more comments on this patch?
[~haibochen], would you mind addressing the checkstyle issues? I understand 
that some of these are inherited due to refactoring. I think the code is more 
maintainable in the long term, if we address them.

> FS should watch node resource utilization and allocate opportunistic 
> containers if appropriate
> --
>
> Key: YARN-1015
> URL: https://issues.apache.org/jira/browse/YARN-1015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Haibo Chen
> Attachments: YARN-1015-YARN-1011.00.patch, 
> YARN-1015-YARN-1011.01.patch, YARN-1015-YARN-1011.02.patch, 
> YARN-1015-YARN-1011.prelim.patch
>
>
> FS should looks at resource utilization of nodes (provided by NM in 
> heartbeat) and allocate opportunistic containers if the resource utilization 
> of the node is below its allocation threshold.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-1015) FS should watch node resource utilization and allocate opportunistic containers if appropriate

2017-11-03 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238188#comment-16238188
 ] 

Miklos Szegedi commented on YARN-1015:
--

Thank you for the patch [~haibochen].
{code}
387   
{code\}yarn.nodemanager.overallocation.memory-utilization-threshold \{code\}
388   and
389   \{code\}yarn.nodemanager.overallocation.cpu-utilization-threshold 
\{code\}
{code}
There are two unnecessary spaces in the code sections
{code}
854 Resource available = opportunistic ?
855 node.allowedResourceForOverAllocation() : 
node.getUnallocatedResource();
{code}
assignContainer is not synchronized. Could the allowed resource for 
overallocation change later while the function is still running?


> FS should watch node resource utilization and allocate opportunistic 
> containers if appropriate
> --
>
> Key: YARN-1015
> URL: https://issues.apache.org/jira/browse/YARN-1015
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-1015-YARN-1011.00.patch, 
> YARN-1015-YARN-1011.01.patch, YARN-1015-YARN-1011.prelim.patch
>
>
> FS should looks at resource utilization of nodes (provided by NM in 
> heartbeat) and allocate opportunistic containers if the resource utilization 
> of the node is below its allocation threshold.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7289) Application lifetime does not work with FairScheduler

2017-11-01 Thread Miklos Szegedi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234942#comment-16234942
 ] 

Miklos Szegedi commented on YARN-7289:
--

[~rohithsharma], could you please look at the addendum? It addresses the 
configuration issue and applies the default to user lifetime in 
AbstractYarnScheduler.java

> Application lifetime does not work with FairScheduler
> -
>
> Key: YARN-7289
> URL: https://issues.apache.org/jira/browse/YARN-7289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Major
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7289.000.patch, YARN-7289.001.patch, 
> YARN-7289.002.patch, YARN-7289.003.patch, YARN-7289.004.patch, 
> YARN-7289.005.patch, YARN-7289.addendum.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7289) Application lifetime does not work with FairScheduler

2017-11-01 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7289:
-
Attachment: YARN-7289.addendum.000.patch

> Application lifetime does not work with FairScheduler
> -
>
> Key: YARN-7289
> URL: https://issues.apache.org/jira/browse/YARN-7289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Major
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-7289.000.patch, YARN-7289.001.patch, 
> YARN-7289.002.patch, YARN-7289.003.patch, YARN-7289.004.patch, 
> YARN-7289.005.patch, YARN-7289.addendum.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 1038 matches

Mail list logo