[jira] [Created] (YARN-5665) Documentation does not mention package name requirement for yarn.resourcemanager.scheduler.class

2016-09-23 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5665:


 Summary: Documentation does not mention package name requirement 
for yarn.resourcemanager.scheduler.class
 Key: YARN-5665
 URL: https://issues.apache.org/jira/browse/YARN-5665
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.0.0-alpha1
Reporter: Miklos Szegedi
Priority: Trivial


http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoop-project-dist/hadoop-common/ClusterSetup.html
 refers to FairScheduler, when it documents the setting 
yarn.resourcemanager.scheduler.class. What it forgets to mention is that the 
user has to specify the full class path like 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler 
otherwise the system throws java.lang.ClassNotFoundException: FairScheduler. It 
would be nice, if the documentation specified the full class path, so that the 
user does not need to look it up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5686) DefaultContainerExecutor random working dir algorigthm skews results

2016-09-28 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5686:


 Summary: DefaultContainerExecutor random working dir algorigthm 
skews results
 Key: YARN-5686
 URL: https://issues.apache.org/jira/browse/YARN-5686
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


{code}
long randomPosition = RandomUtils.nextLong() % totalAvailable;
...
while (randomPosition > availableOnDisk[dir]) {
  randomPosition -= availableOnDisk[dir++];
}
{code}

The code above selects a disk based on the random number weighted by the free 
space on each disk respectively. For example, if I have two disks with 100 
bytes each, totalAvailable is 200. The value of randomPosition will be 0..199. 
0..99 should select the first disk, 100..199 should select the second disk 
inclusively. Random number 100 should select the second disk to be fair but 
this is not the case right now.

We need to use 
{code}
while (randomPosition >= availableOnDisk[dir])
{code}
instead of
{code}
while (randomPosition > availableOnDisk[dir])
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5725) Test uncaught exception in TestContainersMonitorResourceChange.testContainersResourceChange when setting IP and host

2016-10-12 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5725:


 Summary: Test uncaught exception in 
TestContainersMonitorResourceChange.testContainersResourceChange when setting 
IP and host
 Key: YARN-5725
 URL: https://issues.apache.org/jira/browse/YARN-5725
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Minor


The issue is a warning but it prevents container monitor to continue
2016-10-12 14:38:23,280 WARN  [Container Monitor] monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(594)) - Uncaught exception in 
ContainersMonitorImpl while monitoring resource of 
container_123456_0001_01_01
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:455)
2016-10-12 14:38:23,281 WARN  [Container Monitor] monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(613)) - 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 is interrupted. Exiting.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5726) Test exception in TestContainersMonitorResourceChange.testContainersResourceChange when trying to get NMTimelinePublisher

2016-10-12 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5726:


 Summary: Test exception in 
TestContainersMonitorResourceChange.testContainersResourceChange when trying to 
get NMTimelinePublisher
 Key: YARN-5726
 URL: https://issues.apache.org/jira/browse/YARN-5726
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Trivial


2016-10-12 14:38:39,970 WARN  [Container Monitor] monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(594)) - Uncaught exception in 
ContainersMonitorImpl while monitoring resource of 
container_123456_0001_01_01
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:587)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5736) YARN container executor config does not handle white space

2016-10-13 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5736:


 Summary: YARN container executor config does not handle white space
 Key: YARN-5736
 URL: https://issues.apache.org/jira/browse/YARN-5736
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Trivial


The container executor configuration reader does not handle white spaces or 
malformed key value pairs in the config file correctly or gracefully
as an example the following key value line which is part of the configuration 
(note the << is used as a marker to show the extra trailing space):
yarn.nodemanager.linux-container-executor.group=yarn <<
is a valid line but when you run the check over the file:
[root@test]#./container-executor --checksetup
Can't get group information for yarn - Success.
[root@test]#
It fails to find the yarn group but it really tries to find the "yarn " group 
which fails. There is no trimming anywhere while processing the lines. If a 
space would be added in before or after the = sign a failure would also occur.
Minor nit is the fact that a failure still is logged as a Success



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5757) RM Cluster Node API documentation is not up to date

2016-10-19 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5757:


 Summary: RM Cluster Node API documentation is not up to date
 Key: YARN-5757
 URL: https://issues.apache.org/jira/browse/YARN-5757
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Trivial


For an example please refer to this field that does not exist since YARN-686:
healthStatusstring  The health status of the node - Healthy or Unhealthy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5776) Checkstyle: MonitioringThread.Run method length is too long

2016-10-24 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5776:


 Summary: Checkstyle: MonitioringThread.Run method length is too 
long
 Key: YARN-5776
 URL: https://issues.apache.org/jira/browse/YARN-5776
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Trivial


YARN-5725 had a check style violation that should be resolved by refactoring 
the function

Details:
ContainersMonitorImpl.java:395 MonitioringThread.Run @Override:5: Method length 
is 233 lines (max allowed is 150).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5834) TestNodeStatusUpdater.testNMRMConnectionConf compares nodemanager wait time to the incorrect value

2016-11-03 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5834:


 Summary: TestNodeStatusUpdater.testNMRMConnectionConf compares 
nodemanager wait time to the incorrect value
 Key: YARN-5834
 URL: https://issues.apache.org/jira/browse/YARN-5834
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


The function is TestNodeStatusUpdater#testNMRMConnectionConf()
I believe the connectionWaitMs references below were meant to be 
nmRmConnectionWaitMs.
{code}
conf.setLong(YarnConfiguration.NM_RESOURCEMANAGER_CONNECT_MAX_WAIT_MS,
nmRmConnectionWaitMs);
conf.setLong(YarnConfiguration.RESOURCEMANAGER_CONNECT_MAX_WAIT_MS,
connectionWaitMs);
...
  long t = System.currentTimeMillis();
  long duration = t - waitStartTime;
  boolean waitTimeValid = (duration >= nmRmConnectionWaitMs) &&
  (duration < (*connectionWaitMs* + delta));

  if(!waitTimeValid) {
// throw exception if NM doesn't retry long enough
throw new Exception("NM should have tried re-connecting to RM during " +
  "period of at least " + *connectionWaitMs* + " ms, but " +
  "stopped retrying within " + (*connectionWaitMs* + delta) +
  " ms: " + e, e);
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5849) Automatically create YARN control group for pre-mounted cgroups

2016-11-07 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5849:


 Summary: Automatically create YARN control group for pre-mounted 
cgroups
 Key: YARN-5849
 URL: https://issues.apache.org/jira/browse/YARN-5849
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0-alpha1, 2.7.3, 3.0.0-alpha2
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Minor


Yarn can be launched with linux-container-executor.cgroups.mount set to false. 
It will search for the cgroup mount paths set up by the administrator parsing 
the /etc/mtab file. You can also specify resource.percentage-physical-cpu-limit 
to limit the CPU resources assigned to containers.
linux-container-executor.cgroups.hierarchy is the root of the settings of all 
YARN containers. If this is specified but not created YARN will fail at startup:
Caused by: java.io.FileNotFoundException: 
/cgroups/cpu/hadoop-yarn/cpu.cfs_period_us (Permission denied)
org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.updateCgroup(CgroupsLCEResourcesHandler.java:263)

This JIRA is about automatically creating YARN control group in the case above. 
It reduces the cost of administration.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5927) BaseContainerManagerTest::waitForNMContainerState timeout accounting is not accurate

2016-11-22 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5927:


 Summary: BaseContainerManagerTest::waitForNMContainerState timeout 
accounting is not accurate
 Key: YARN-5927
 URL: https://issues.apache.org/jira/browse/YARN-5927
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Trivial


See below that timeoutSecs is increased twice. We also do a sleep right away 
before even checking the observed value.
{code}
do {
  Thread.sleep(2000);
 ...
  timeoutSecs += 2;
} while (!finalStates.contains(currentState)
&& timeoutSecs++ < timeOutMax);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5986) Adding YARN configuration entries to ContainerLaunchContext

2016-12-08 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5986:


 Summary: Adding YARN configuration entries to 
ContainerLaunchContext
 Key: YARN-5986
 URL: https://issues.apache.org/jira/browse/YARN-5986
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


Currently ContainerLaunchContext is defined as
message ContainerLaunchContextProto {
  repeated StringLocalResourceMapProto localResources = 1;
  optional bytes tokens = 2;
  repeated StringBytesMapProto service_data = 3;
  repeated StringStringMapProto environment = 4;
  repeated string command = 5;
  repeated ApplicationACLMapProto application_ACLs = 6;
  optional ContainerRetryContextProto container_retry_context = 7;
}
It would be nice to have an additional parameter "configuration" to support 
cases like YARN-5600, where we want to pass a parameter to Yarn and not the 
application or container.
  repeated StringStringMapProto configuration = 8;




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-5987) NM configured command to collect heap dump of preemted container

2016-12-08 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-5987:


 Summary: NM configured command to collect heap dump of preemted 
container
 Key: YARN-5987
 URL: https://issues.apache.org/jira/browse/YARN-5987
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


The node manager can kill a container, if it exceeds the assigned memory 
limits. It would be nice to have a configuration entry to set up a command that 
can collect additional debug information, if needed. The collected information 
can be used for root cause analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-05 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6060:


 Summary: Linux container executor fails to run container on 
directories mounted as noexec
 Key: YARN-6060
 URL: https://issues.apache.org/jira/browse/YARN-6060
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, yarn
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


If node manager directories are mounted as noexec, LCE fails with the following 
error:
Launching container...
Couldn't execute the container launch file 
/tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
 - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-09 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi resolved YARN-6060.
--
Resolution: Won't Fix
  Assignee: (was: Miklos Szegedi)

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6077) /bin/bash path is hardcoded in node manager

2017-01-09 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6077:


 Summary: /bin/bash path is hardcoded in node manager
 Key: YARN-6077
 URL: https://issues.apache.org/jira/browse/YARN-6077
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


There should be a configuration similar to MRJobConfig.MAPRED_ADMIN_USER_SHELL




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6144) Fair share calculation error

2017-02-03 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6144:


 Summary: Fair share calculation error
 Key: YARN-6144
 URL: https://issues.apache.org/jira/browse/YARN-6144
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 3.0.0-alpha2
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Blocker


{{preemptContainers()}} calls {{trackContainerForPreemption()}} to collect the 
list of containers and resources that were preempted for an application. Later 
the list is reduced when {{containerCompleted()}} calls 
{{untrackContainerForPreemption()}}. The bug is that the resource variable 
{{preemptedResources}} is subtracted, not just when the container was preempted 
but also when it has completed successfully. This causes that we return an 
incorrect value in {{getResourceUsage()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6158) FairScheduler: app usage can go to negative

2017-02-07 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6158:


 Summary: FairScheduler: app usage can go to negative
 Key: YARN-6158
 URL: https://issues.apache.org/jira/browse/YARN-6158
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 3.0.0-alpha2
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


FiCaSchedulerApp.containerCompleted checks, if the container being completed is 
in the active list:
{code}
  // Remove from the list of containers
  if (null == liveContainers.remove(containerId)) {
return false;
  }
{code}
Fair scheduler should do the same, otherwise multiple different container close 
events leave the application with negative resource usage in 
{{queue.getMetrics().releaseResources}} and {{attemptResourceUsage.decUsed}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6169) container-executor message on empty configuration file can be improved

2017-02-09 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6169:


 Summary: container-executor message on empty configuration file 
can be improved
 Key: YARN-6169
 URL: https://issues.apache.org/jira/browse/YARN-6169
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Miklos Szegedi
Priority: Trivial


If the configuration file is empty, we get the following error message:
{{Invalid configuration provided in /root/etc/hadoop/container-executor.cfg}}
This is does not provide enough details to figure out what is the issue at the 
first glance. We should use something like 'Empty configuration file 
provided...'
{code}
  if (cfg->size == 0) {
fprintf(ERRORFILE, "Invalid configuration provided in %s\n", file_name);
exit(INVALID_CONFIG_FILE);
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6171) ConcurrentModificationException in ApplicationMasterService.allocate

2017-02-10 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6171:


 Summary: ConcurrentModificationException in 
ApplicationMasterService.allocate
 Key: YARN-6171
 URL: https://issues.apache.org/jira/browse/YARN-6171
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha2
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


I have noticed an exception that closes the Application Master occasionally 
with Fair scheduler.
{code}
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.util.ConcurrentModificationException):
 java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
at java.util.HashMap$KeyIterator.next(HashMap.java:956)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:1005)
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:532)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6218) TestAMRMClient fails with fair scheduler

2017-02-22 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6218:


 Summary: TestAMRMClient fails with fair scheduler
 Key: YARN-6218
 URL: https://issues.apache.org/jira/browse/YARN-6218
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Minor


We ran into this issue on v2. Allocation does not happen in the specified 
amount of time.

Error Message
expected:<2> but was:<0>
Stacktrace
java.lang.AssertionError: expected:<2> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientMatchStorage(TestAMRMClient.java:495)




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6230) Failing unit test TestFairScheduler.testMoveWouldViolateMaxResourcesConstraints

2017-02-23 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6230:


 Summary: Failing unit test 
TestFairScheduler.testMoveWouldViolateMaxResourcesConstraints
 Key: YARN-6230
 URL: https://issues.apache.org/jira/browse/YARN-6230
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


I have run into this in one of the job runs:
{code}
Tests run: 92, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.683 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
testMoveWouldViolateMaxResourcesConstraints(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
  Time elapsed: 0.115 sec  <<< ERROR!
java.lang.Exception: Unexpected exception, 
expected but 
was
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testMoveWouldViolateMaxResourcesConstraints(TestFairScheduler.java:4533)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6302) Fail the node, if Linux Container Executor is not configured properly

2017-03-07 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6302:


 Summary: Fail the node, if Linux Container Executor is not 
configured properly
 Key: YARN-6302
 URL: https://issues.apache.org/jira/browse/YARN-6302
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Minor


We have a cluster that has one node with misconfigured Linux Container 
Executor. Every time an AM or regular container is launched on the cluster, it 
will fail. The node will still have resources available, so it keeps failing 
apps until the administrator notices the issue and decommissions the node. AM 
Blacklisting only helps, if the application is already running.

As a possible improvement, when the LCE is used on the cluster and a NM gets 
certain errors back from the LCE, like error 24 configuration not found, we 
should not try to allocate anything on the node anymore or shut down the node 
entirely. That kind of problem normally does not fix itself and it means that 
nothing can really run on that node.

{code}
Application application_1488920587909_0010 failed 2 times due to AM Container 
for appattempt_1488920587909_0010_02 exited with exitCode: -1000
Failing this attempt.Diagnostics: Application application_1488920587909_0010 
initialization failed (exitCode=24) with output:
For more detailed output, check the application tracking page: 
http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then 
click on links to logs of each attempt.
. Failing the application.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6361) FSLeafQueue.fetchAppsWithDemand CPU usage is high with big queues

2017-03-16 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6361:


 Summary: FSLeafQueue.fetchAppsWithDemand CPU usage is high with 
big queues
 Key: YARN-6361
 URL: https://issues.apache.org/jira/browse/YARN-6361
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. 
Most of the time is spent in FairShareComparator.compare. We could improve this 
by doing the calculations outside the sort loop (O(n)) and we sorted by a fixed 
number inside instead O(n*log(n)).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6368) Decommissioning an NM results in a -1 exit code

2017-03-20 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6368:


 Summary: Decommissioning an NM results in a -1 exit code
 Key: YARN-6368
 URL: https://issues.apache.org/jira/browse/YARN-6368
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Minor


In NodeManager.java we should exit normally in case the RM shuts down the node:
{code}
} finally {
  if (shouldExitOnShutdownEvent
  && !ShutdownHookManager.get().isShutdownInProgress()) {
ExitUtil.terminate(-1);
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6412) aux-services classpath not documented

2017-03-29 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6412:


 Summary: aux-services classpath not documented
 Key: YARN-6412
 URL: https://issues.apache.org/jira/browse/YARN-6412
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


YARN-4577 introduced two new configuration entries 
yarn.nodemanager.aux-services.%s.classpath and 
yarn.nodemanager.aux-services.%s.system-classes. These are not documented in 
hadoop-yarn-common/.../yarn-default.xml



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6416) SIGNAL_CMD argument number is wrong

2017-03-30 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6416:


 Summary: SIGNAL_CMD argument number is wrong
 Key: YARN-6416
 URL: https://issues.apache.org/jira/browse/YARN-6416
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


Yarn application signal command has two arguments, so the number below should 
be 2 I think.
{code}
opts.getOption(SIGNAL_CMD).setArgs(3);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6417) TestContainerManager.testContainerLaunchAndStop disabled

2017-03-30 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6417:


 Summary: TestContainerManager.testContainerLaunchAndStop disabled
 Key: YARN-6417
 URL: https://issues.apache.org/jira/browse/YARN-6417
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Miklos Szegedi
Priority: Minor


TestContainerManager.testContainerLaunchAndStop was disabled in YARN-1897 but 
it is passing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6432) FS preemption should reserve a node before considering containers on it for preemption

2017-04-03 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6432:


 Summary: FS preemption should reserve a node before considering 
containers on it for preemption
 Key: YARN-6432
 URL: https://issues.apache.org/jira/browse/YARN-6432
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6433) Only accessible cgroup mount directories should be selected for a controller

2017-04-03 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6433:


 Summary: Only accessible cgroup mount directories should be 
selected for a controller
 Key: YARN-6433
 URL: https://issues.apache.org/jira/browse/YARN-6433
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0-alpha3
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


I have a Ubuntu16 box that returns the following error with pre-mounted cgroups 
on the latest trunk:
{code}
2017-04-03 19:42:18,511 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
 Cgroups not accessible /run/lxcfs/controllers/cpu,cpuacct
{code}
The version is:
{code}
$ uname -a
Linux mybox 4.4.0-24-generic #43-Ubuntu SMP Wed Jun 8 19:27:37 UTC 2016 x86_64 
x86_64 x86_64 GNU/Linux
{code}
The following cpu cgroup filesystems are mounted:
{code}
$ grep cpuacct /etc/mtab
cgroup /sys/fs/cgroup/cpu,cpuacct cgroup 
rw,nosuid,nodev,noexec,relatime,cpu,cpuacct,nsroot=/ 0 0
cpu,cpuacct /run/lxcfs/controllers/cpu,cpuacct cgroup 
rw,relatime,cpu,cpuacct,nsroot=/ 0 0
{code}
/sys/fs/cgroup is accessible to my yarn user, so it should be selected instead 
of /run/lxcfs/controllers



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6438) Code can be improved in ContainersMonitorImpl.java

2017-04-04 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6438:


 Summary: Code can be improved in ContainersMonitorImpl.java
 Key: YARN-6438
 URL: https://issues.apache.org/jira/browse/YARN-6438
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Miklos Szegedi
Priority: Minor


I noticed two code blocks that can be improved in ContainersMonitorImpl.java.  
cpuUsagePercentPerCoreByAllContainers and cpuUsageTotalCoresByAllContainers 
track the same value and CHANGE_MONITORING_CONTAINER_RESOURCE is checked twice 
along with two calls to changeContainerResource.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6442) Inaccurate javadoc in NodeManagerHardwareUtils.getContainerMemoryMB

2017-04-04 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6442:


 Summary: Inaccurate javadoc in 
NodeManagerHardwareUtils.getContainerMemoryMB
 Key: YARN-6442
 URL: https://issues.apache.org/jira/browse/YARN-6442
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


NodeManagerHardwareUtils.getContainerMemoryMB has the following javadoc:
{code}
"If the OS has a
   * ResourceCalculatorPlugin implemented, the calculation is 0.8 * (RAM - 2 *
   * JVM-memory) i.e. use 80% of the memory after accounting for memory used by
   * the DataNode and the NodeManager. If the number is less than 1GB, log a
   * warning message."
{code}
I think the accurate expression is 0.8*(RAM-2*JVM)-systemreserved. I also do 
not see the 1GB cap in the code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6456) Isolation of Docker containers In LinuxContainerExecutor

2017-04-07 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6456:


 Summary: Isolation of Docker containers In LinuxContainerExecutor
 Key: YARN-6456
 URL: https://issues.apache.org/jira/browse/YARN-6456
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Miklos Szegedi


One reason to use Docker containers is to be able to isolate different 
workloads, even, if they run as the same user.
I have noticed some issues in the current design:
1. DockerLinuxContainerRuntime mounts containerLocalDirs 
{{nm-local-dir/usercache/user/appcache/application_1491598755372_0011/}} and 
userLocalDirs {{nm-local-dir/usercache/user/}}, so that a container can see and 
modify the files of another container. I think the application file cache 
directory should be enough for the container to run in most of the cases.
2. The whole cgroups directory is mounted. Would the container directory be 
enough?
3. There is no way to enforce exclusive use of Docker for all containers. There 
should be an option that it is not the user but the admin that requires to use 
Docker.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6472) Possible Java sandbox improvements

2017-04-12 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6472:


 Summary: Possible Java sandbox improvements
 Key: YARN-6472
 URL: https://issues.apache.org/jira/browse/YARN-6472
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Greg Phillips


I set the sandbox to enforcing mode. Unfortunately I was able to break out of 
the sandbox running native code with the following command:
{code}
cmd = "$JAVA_HOME/bin/java %s -Xmx825955249 
org.apache.hadoop.yarn.applications.helloworld.HelloWorld `touch 
../../helloworld`" + \
  " 1>/AppMaster.stdout 2>/AppMaster.stderr"

$ ls .../nm-local-dir/usercache/root/appcache/
helloworld
{code}
Also, if I am not using sandboxes, could we create the nm-sandbox-policies 
directory (empty) lazily?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6474) CGroupsHandlerImpl.java has a few checkstyle issues left to be fixed after YARN-5301

2017-04-12 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6474:


 Summary: CGroupsHandlerImpl.java has a few checkstyle issues left 
to be fixed after YARN-5301
 Key: YARN-6474
 URL: https://issues.apache.org/jira/browse/YARN-6474
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


The main issue is throw inside finally



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6475) Fix some long function checkstyle issues

2017-04-12 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6475:


 Summary: Fix some long function checkstyle issues
 Key: YARN-6475
 URL: https://issues.apache.org/jira/browse/YARN-6475
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Trivial


I am talking about these two:
{code}
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java:441:
  @Override:3: Method length is 176 lines (max allowed is 150). [MethodLength]
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java:159:
  @Override:3: Method length is 158 lines (max allowed is 150). [MethodLength]
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6500) Do not mount inaccessible cgroups directories in CgroupsLCEResourcesHandler

2017-04-19 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6500:


 Summary: Do not mount inaccessible cgroups directories in 
CgroupsLCEResourcesHandler
 Key: YARN-6500
 URL: https://issues.apache.org/jira/browse/YARN-6500
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


Port YARN-6433 findControllerInMtab change from CGroupsHandlerImpl to 
CgroupsLCEResourcesHandler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6525) Linux container executor should not propagate application errors

2017-04-25 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6525:


 Summary: Linux container executor should not propagate application 
errors
 Key: YARN-6525
 URL: https://issues.apache.org/jira/browse/YARN-6525
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha2
Reporter: Miklos Szegedi


wait_and_get_exit_code currently returns the application error code as LCE 
error code. This may overlap with LCE errors. Instead LCE should return a fixed 
application failed error code. I should print the application error into the 
logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6732) Could not find artifact org.apache.hadoop:hadoop-azure-datalake:jar

2017-06-22 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6732:


 Summary: Could not find artifact 
org.apache.hadoop:hadoop-azure-datalake:jar
 Key: YARN-6732
 URL: https://issues.apache.org/jira/browse/YARN-6732
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


I get the following build error when resolving dependencies:
{code}
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 04:03 min
[INFO] Finished at: 2017-06-22T18:34:55+00:00
[INFO] Final Memory: 69M/167M
[INFO] 
[ERROR] Failed to execute goal on project hadoop-tools-dist: Could not resolve 
dependencies for project 
org.apache.hadoop:hadoop-tools-dist:jar:2.9.0-SNAPSHOT: Could not find artifact 
org.apache.hadoop:hadoop-azure-datalake:jar:2.9.0-SNAPSHOT in 
apache.snapshots.https 
(https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-tools-dist
The command '/bin/sh -c mvn dependency:resolve' returned a non-zero code: 1
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6757) Refactor the setting yarn.nodemanager.linux-container-executor.cgroups.mount-path

2017-06-30 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6757:


 Summary: Refactor the setting 
yarn.nodemanager.linux-container-executor.cgroups.mount-path
 Key: YARN-6757
 URL: https://issues.apache.org/jira/browse/YARN-6757
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0-alpha4
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Minor


We should add the ability to specify a custom cgroup path. This is how the 
documentation of inux-container-executor.cgroups.mount-path would look like:
{code}
Requested cgroup mount path. Yarn has built in functionality to discover
the system cgroup mount paths, so use this setting only, if the discovery 
does not work.

This path must exist before the NodeManager is launched.
The location can vary depending on the Linux distribution in use.
Common locations include /sys/fs/cgroup and /cgroup.

If cgroups are not mounted, set 
yarn.nodemanager.linux-container-executor.cgroups.mount
to true. In this case it specifies, where the LCE should attempt to mount 
cgroups if not found.

If cgroups is accessible through lxcfs or some other file system,
then set this path and 
yarn.nodemanager.linux-container-executor.cgroups.mount to false.
Yarn tries to use this path first, before any cgroup mount point discovery.
If it cannot find this directory, it falls back to searching for cgroup 
mount points in the system.
Only used when the LCE resources handler is set to the 
CgroupsLCEResourcesHandler
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6895) Preemption reservation may cause regular reservation leaks

2017-07-27 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6895:


 Summary: Preemption reservation may cause regular reservation leaks
 Key: YARN-6895
 URL: https://issues.apache.org/jira/browse/YARN-6895
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Blocker


We found a limitation in the implementation of YARN-6432. If the container 
released is smaller than the preemption request, a node reservation is created 
that is never deleted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6913) Some cgroup settings are not documented in yarn-default.xml

2017-07-31 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6913:


 Summary: Some cgroup settings are not documented in 
yarn-default.xml
 Key: YARN-6913
 URL: https://issues.apache.org/jira/browse/YARN-6913
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Miklos Szegedi
Priority: Minor


yarn.nodemanager.resource.memory.cgroups.swappiness
yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage
yarn.nodemanager.resource.network.outbound-bandwidth-mbit
yarn.nodemanager.resource.network.outbound-bandwidth-yarn-mbit




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6925) FSSchedulerNode could be simplified extracting preemption fields into a class

2017-08-01 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6925:


 Summary: FSSchedulerNode could be simplified extracting preemption 
fields into a class
 Key: YARN-6925
 URL: https://issues.apache.org/jira/browse/YARN-6925
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Miklos Szegedi
Assignee: Yufei Gu
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6926) FSSchedulerNode reservation conflict

2017-08-01 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6926:


 Summary: FSSchedulerNode reservation conflict
 Key: YARN-6926
 URL: https://issues.apache.org/jira/browse/YARN-6926
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Miklos Szegedi
Assignee: Yufei Gu


FSSchedulerNode reserves space for preemptor apps, but other nodes may reserve 
normally, if there is not enough free space. This causes double accounting and 
reservation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6943) Update Yarn to YARN in documentation

2017-08-03 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6943:


 Summary: Update Yarn to YARN in documentation
 Key: YARN-6943
 URL: https://issues.apache.org/jira/browse/YARN-6943
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Priority: Minor


Based on the discussion with [~templedf] in YARN-6757 the official case of YARN 
is YARN, not Yarn, so we should update all the md files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6968) Hard coded reference to an absolute pathname in org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(Contai

2017-08-08 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6968:


 Summary: Hard coded reference to an absolute pathname in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(ContainerRuntimeContext)
 Key: YARN-6968
 URL: https://issues.apache.org/jira/browse/YARN-6968
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


This could be done after YARN-6757 is checked in.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6997) org.apache.hadoop.yarn.client.SCMAdmin wrong package name

2017-08-11 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6997:


 Summary: org.apache.hadoop.yarn.client.SCMAdmin wrong package name
 Key: YARN-6997
 URL: https://issues.apache.org/jira/browse/YARN-6997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Miklos Szegedi
Priority: Critical


It should be org.apache.hadoop.yarn.client.cli.SCMAdmin to follow the current 
naming standard. This may cause appcompat issues in the future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6998) Dead code in SCMAdmin

2017-08-11 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6998:


 Summary: Dead code in SCMAdmin
 Key: YARN-6998
 URL: https://issues.apache.org/jira/browse/YARN-6998
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Reporter: Miklos Szegedi
Priority: Minor


printHelp is always called with ""



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7001) If shared cache upload is terminated in the middle, the temp file will never be deleted

2017-08-11 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7001:


 Summary: If shared cache upload is terminated in the middle, the 
temp file will never be deleted
 Key: YARN-7001
 URL: https://issues.apache.org/jira/browse/YARN-7001
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


There is a missing deleteTempFile(tempPath);
{code}
  tempPath = new Path(directoryPath, getTemporaryFileName(actualPath));
  if (!uploadFile(actualPath, tempPath)) {
LOG.warn("Could not copy the file to the shared cache at " + tempPath);
return false;
  }
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7009) TestNMClient.testNMClientNoCleanupOnStop is flaky by design

2017-08-14 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7009:


 Summary: TestNMClient.testNMClientNoCleanupOnStop is flaky by 
design
 Key: YARN-7009
 URL: https://issues.apache.org/jira/browse/YARN-7009
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


The sleeps to wait for a transition to reinit and than back to running is not 
long enough, it can miss the reinit event.
{code}
java.lang.AssertionError: Exception is not expected: 
org.apache.hadoop.yarn.exceptions.YarnException: Cannot perform RE_INIT on 
[container_1502735389852_0001_01_01]. Current state is [REINITIALIZING, 
isReInitializing=true].
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.preReInitializeOrLocalizeCheck(ContainerManagerImpl.java:1772)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1697)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1668)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.reInitializeContainer(ContainerManagementProtocolPBServiceImpl.java:214)
at 
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:237)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)


at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testReInitializeContainer(TestNMClient.java:567)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:405)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:214)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Cannot perform 
RE_INIT on [container_1502735389852_0001_01_01]. Current state is 
[REINITIALIZING, isReInitializing=true].
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.preReInitializeOrLocalizeCheck(ContainerManagerImpl.java:1772)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1697)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.reInitializeContainer(ContainerManagerImpl.java:1668)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.reInitializeContainer(ContainerManagementProtocolPBServiceImpl.java:214)
at 
org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:237)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccess

[jira] [Created] (YARN-7034) DefaultLinuxContainerRuntime and DockerLinuxContainerRuntime sends client environment variables to container-executor

2017-08-16 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7034:


 Summary: DefaultLinuxContainerRuntime and 
DockerLinuxContainerRuntime sends client environment variables to 
container-executor
 Key: YARN-7034
 URL: https://issues.apache.org/jira/browse/YARN-7034
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Miklos Szegedi
Priority: Critical


This behavior is unnecessary since there is nothing that is used from the 
environment right now. One option is to whitelist these variables before 
passing them. Are there any known use cases for this to justify?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7064) Use cgroup to get container resource utilization

2017-08-21 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7064:


 Summary: Use cgroup to get container resource utilization
 Key: YARN-7064
 URL: https://issues.apache.org/jira/browse/YARN-7064
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


This is an addendum to YARN-6668. What happens is that that jira always wants 
to rebase patches against YARN-1011 instead of trunk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7099) ResourceHandlerModule.parseConfiguredCGroupPath only works for privileged yarn users.

2017-08-24 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7099:


 Summary: ResourceHandlerModule.parseConfiguredCGroupPath only 
works for privileged yarn users.
 Key: YARN-7099
 URL: https://issues.apache.org/jira/browse/YARN-7099
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Minor


canWrite is failing
{code}
if (candidate.isDirectory() && candidate.canWrite()) {
  pathSubsystemMappings.put(candidate.getAbsolutePath(), cgroupList);
} else {
  LOG.warn("The following cgroup is not a directory or it is not"
  + " writable" + candidate.getAbsolutePath());
}
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7145) Identify potential flaky unit tests

2017-08-31 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7145:


 Summary: Identify potential flaky unit tests
 Key: YARN-7145
 URL: https://issues.apache.org/jira/browse/YARN-7145
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager, resourcemanager
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Minor


I intend to add a 200 milliseconds sleep into AsyncDispatcher, and run the job 
to identify the tests that are potentially flaky.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7181) CPUTimeTracker.updateElapsedJiffies can report negative usage

2017-09-08 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7181:


 Summary: CPUTimeTracker.updateElapsedJiffies can report negative 
usage
 Key: YARN-7181
 URL: https://issues.apache.org/jira/browse/YARN-7181
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


It happens, when the process exited and elapsedJiffies becomes 0 again.
{code}
  public void updateElapsedJiffies(BigInteger elapsedJiffies, long newTime) {
cumulativeCpuTime = elapsedJiffies.multiply(jiffyLengthInMillis);
sampleTime = newTime;
 }
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7232) Consider /sys/fs/cgroup as the default CGroup mount path

2017-09-20 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7232:


 Summary: Consider /sys/fs/cgroup as the default CGroup mount path
 Key: YARN-7232
 URL: https://issues.apache.org/jira/browse/YARN-7232
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Miklos Szegedi


YARN-6968 fixed the findbugs issue due to the hard coded /sys/fs/cgroups mount 
path for Docker containers. It removed the default value on the other hand. 
This jira is a followup to make sure the admin does not have to set the value 
every time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7233) Make the cgroup mount into Docker containers configurable

2017-09-20 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7233:


 Summary: Make the cgroup mount into Docker containers configurable
 Key: YARN-7233
 URL: https://issues.apache.org/jira/browse/YARN-7233
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Miklos Szegedi


Not all containers need this mount. There should be an option to opt for lxcfs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7239) Possible launch/cleanup race condition in ContainersLauncher

2017-09-21 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7239:


 Summary: Possible launch/cleanup race condition in 
ContainersLauncher
 Key: YARN-7239
 URL: https://issues.apache.org/jira/browse/YARN-7239
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


ContainersLauncher.handle() submits the launch job and then adds the job into 
the collection risking that the cleanup will miss it and return. This should be 
in reversed order in all 3 instances:
{code}
containerLauncher.submit(launch);
running.put(containerId, launch);
{code}
The cleanup code the above code is racing with:
{code}
ContainerLaunch runningContainer = running.get(containerId);
if (runningContainer == null) {
  // Container not launched. So nothing needs to be done.
  LOG.info("Container " + containerId + " not running, nothing to 
signal.");
  return;
}
...
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7289) TestApplicationLifetimeMonitor.testApplicationLifetimeMonitor times out

2017-10-04 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7289:


 Summary: 
TestApplicationLifetimeMonitor.testApplicationLifetimeMonitor times out
 Key: YARN-7289
 URL: https://issues.apache.org/jira/browse/YARN-7289
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7293) ContainerLaunch.cleanupContainer may miss a starting node

2017-10-05 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7293:


 Summary: ContainerLaunch.cleanupContainer may miss a starting node
 Key: YARN-7293
 URL: https://issues.apache.org/jira/browse/YARN-7293
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


The relevant part of YARN-7009 needs to be backported



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7294) TestSignalContainer#testSignalRequestDeliveryToNM fails intermittently with Fair scheduler

2017-10-06 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7294:


 Summary: TestSignalContainer#testSignalRequestDeliveryToNM fails 
intermittently with Fair scheduler
 Key: YARN-7294
 URL: https://issues.apache.org/jira/browse/YARN-7294
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


This issue exists due to the fact that FS needs an update after allocation and 
more node updates for all the requests to be fulfilled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-4943) Add support to collect actual resource usage from cgroups

2017-10-16 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi resolved YARN-4943.
--
Resolution: Duplicate

> Add support to collect actual resource usage from cgroups
> -
>
> Key: YARN-4943
> URL: https://issues.apache.org/jira/browse/YARN-4943
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Varun Vasudev
>
> We should add support to collect actual resource usage from Cgroups(if 
> they're enabled) - it's more accurate and it can give you more detailed 
> information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7349) TestSignalContainer.testSignalRequestDeliveryToNM fails with attempt state is not correct

2017-10-17 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7349:


 Summary: TestSignalContainer.testSignalRequestDeliveryToNM fails 
with attempt state is not correct
 Key: YARN-7349
 URL: https://issues.apache.org/jira/browse/YARN-7349
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


java.lang.AssertionError: Attempt state is not correct (timeout). 
Expected :ALLOCATED
Actual   :SCHEDULED

at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:358)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:317)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:298)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:955)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestSignalContainer.testSignalRequestDeliveryToNM(TestSignalContainer.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7350) TestSignalContainer should check both FS and CS

2017-10-17 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7350:


 Summary: TestSignalContainer should check both FS and CS
 Key: YARN-7350
 URL: https://issues.apache.org/jira/browse/YARN-7350
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7354) Fair scheduler should support application lifetime monitor

2017-10-18 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7354:


 Summary: Fair scheduler should support application lifetime monitor
 Key: YARN-7354
 URL: https://issues.apache.org/jira/browse/YARN-7354
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


For details see the fair scheduler specific code in 
TestApplicationLifetimeMonitor.testApplicationLifetimeMonitor added by YARN-7289



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7387) org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer fails intermittently

2017-10-24 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7387:


 Summary: 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 fails intermittently
 Key: YARN-7387
 URL: https://issues.apache.org/jira/browse/YARN-7387
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


{code}
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 52.481 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
testDecreaseAfterIncreaseWithAllocationExpiration(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer)
  Time elapsed: 13.292 sec  <<< FAILURE!
java.lang.AssertionError: expected:<3072> but was:<4096>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer.testDecreaseAfterIncreaseWithAllocationExpiration(TestIncreaseAllocationExpirer.java:459)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7506) Overhaul the design of the Linux container-executor regarding Docker and future runtimes

2017-11-15 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7506:


 Summary: Overhaul the design of the Linux container-executor 
regarding Docker and future runtimes
 Key: YARN-7506
 URL: https://issues.apache.org/jira/browse/YARN-7506
 Project: Hadoop YARN
  Issue Type: Wish
  Components: nodemanager
Reporter: Miklos Szegedi


I raise this topic to discuss a potential improvement of the container executor 
tool in node manager.
container-executor has two main purposes. It executes Linux *system calls not 
available from Java*, and it executes tasks *available to root that are not 
available to the yarn user*. Historically container-executor did both by doing 
impersonation. The yarn user is separated from root because it runs network 
services, so *the yarn user should be restricted* by design. Because of this it 
has it's own config file container-executor.cfg writable by root only that 
specifies what actions are allowed for the yarn user. However, the requirements 
have changed with Docker and that raises the following questions:

1. The Docker feature of YARN requires root permissions to *access the Docker 
socket* but it does not run any system calls, so could the Docker related code 
in container-executor be *refactored into a separate Java process ran as root*? 
Java would make the development much faster and more secure. 

2. The Docker feature only needs the Docker unix socket. It is not a good idea 
to let the yarn user directly access the socket, since that would elevate its 
privileges to root. However, the Java tool running as root mentioned in the 
previous question could act as a *proxy on the Docker socket* operating 
directly on the Docker REST API *eliminating the need to use the Docker CLI*. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7553) TestShellBasedUnixGroupsMapping.testFiniteGroupResolutionTime flaky

2017-11-21 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7553:


 Summary: 
TestShellBasedUnixGroupsMapping.testFiniteGroupResolutionTime flaky
 Key: YARN-7553
 URL: https://issues.apache.org/jira/browse/YARN-7553
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


{code}
[ERROR] 
testFiniteGroupResolutionTime(org.apache.hadoop.security.TestShellBasedUnixGroupsMapping)
  Time elapsed: 61.975 s  <<< FAILURE!
java.lang.AssertionError: 
Expected the logs to carry a message about command timeout but was: 2017-11-22 
00:10:57,523 WARN  security.ShellBasedUnixGroupsMapping 
(ShellBasedUnixGroupsMapping.java:getUnixGroups(181)) - unable to return groups 
for user foobarnonexistinguser
PartialGroupNameException The user name 'foobarnonexistinguser' is not found. 
at 
org.apache.hadoop.security.ShellBasedUnixGroupsMapping.resolvePartialGroupNames(ShellBasedUnixGroupsMapping.java:275)
at 
org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:178)
at 
org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:97)
at 
org.apache.hadoop.security.TestShellBasedUnixGroupsMapping.testFiniteGroupResolutionTime(TestShellBasedUnixGroupsMapping.java:278)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7554) TestCryptoStreamsWithOpensslAesCtrCryptoCodec fails on Debian 9

2017-11-21 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7554:


 Summary: TestCryptoStreamsWithOpensslAesCtrCryptoCodec fails on 
Debian 9
 Key: YARN-7554
 URL: https://issues.apache.org/jira/browse/YARN-7554
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


{code}
[ERROR] org.apache.hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec  
Time elapsed: 0.478 s  <<< FAILURE!
java.lang.AssertionError: Unable to instantiate codec 
org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, is the required version of 
OpenSSL installed?
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:621)
at 
org.apache.hadoop.crypto.TestCryptoStreamsWithOpensslAesCtrCryptoCodec.init(TestCryptoStreamsWithOpensslAesCtrCryptoCodec.java:43)
{code}
This happened due to the following openssl change:
https://github.com/openssl/openssl/commit/ff4b7fafb315df5f8374e9b50c302460e068f188



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7577) Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart

2017-11-28 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7577:


 Summary: Unit Fail: TestAMRestart#testPreemptedAMRestartOnRMRestart
 Key: YARN-7577
 URL: https://issues.apache.org/jira/browse/YARN-7577
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


This happens, if Fair Scheduler is the default. The test should run with both 
schedulers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7712) Add ability to ignore timestamps in localized files

2018-01-08 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7712:


 Summary: Add ability to ignore timestamps in localized files
 Key: YARN-7712
 URL: https://issues.apache.org/jira/browse/YARN-7712
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


YARN currently requires and checks the timestamp of localized files and fails, 
if the file on HDFS does not match to the one requested. This jira adds the 
ability to ignore the timestamp based on the request of the client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7713) Add parallel copying of directories into

2018-01-08 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7713:


 Summary: Add parallel copying of directories into
 Key: YARN-7713
 URL: https://issues.apache.org/jira/browse/YARN-7713
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


YARN currently copies directories sequentially when localizing. This could be 
improved to do in parallel, since the source blocks are normally on different 
nodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7734) YARN-5418 breaks TestContainerLogsPage.testContainerLogPageAccess

2018-01-10 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7734:


 Summary: YARN-5418 breaks 
TestContainerLogsPage.testContainerLogPageAccess
 Key: YARN-7734
 URL: https://issues.apache.org/jira/browse/YARN-7734
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Xuan Gong


It adds a call to LogAggregationFileControllerFactory where the context is not 
filled in with the configuration in the mock in the unit test.
{code}
[ERROR] Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.492 s 
<<< FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage
[ERROR] 
testContainerLogPageAccess(org.apache.hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage)
  Time elapsed: 0.208 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.(LogAggregationFileControllerFactory.java:68)
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.ContainerLogsPage$ContainersLogsBlock.(ContainerLogsPage.java:100)
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage.testContainerLogPageAccess(TestContainerLogsPage.java:268)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7758) Add an additional check to the validity of container and application ids passed to container-executor

2018-01-16 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7758:


 Summary: Add an additional check to the validity of container and 
application ids passed to container-executor
 Key: YARN-7758
 URL: https://issues.apache.org/jira/browse/YARN-7758
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Yufei Gu


I would make sure that they contain characters a-z 0-9 and _/ (underscore and 
dash)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7775) Unit test fail: Testing resolve_config_path in 2.7

2018-01-18 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7775:


 Summary: Unit test fail: Testing resolve_config_path in 2.7
 Key: YARN-7775
 URL: https://issues.apache.org/jira/browse/YARN-7775
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


I see this in the latest branch-2.7 running test-container-executor

Testing resolve_config_path

FAIL: failed to resolve config_name on an absolute path name: /bin/ls



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7803) TestZKFailoverController occasionally fails in trunk

2018-01-23 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7803:


 Summary: TestZKFailoverController occasionally fails in trunk
 Key: YARN-7803
 URL: https://issues.apache.org/jira/browse/YARN-7803
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.0
Reporter: Miklos Szegedi


[ERROR] 
testGracefulFailoverMultipleZKfcs(org.apache.hadoop.ha.TestZKFailoverController)
 Time elapsed: 70.35 s <<< ERROR! org.apache.hadoop.ha.ServiceFailedException: 
Unable to become active. Local node did not get an opportunity to do so from 
ZooKeeper, or the local node took too long to transition to active. at 
org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:692)
 at 
org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:60)
 at 
org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:609) 
at 
org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:606) 
at java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
 at 
org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:606)
 at org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94) 
at 
org.apache.hadoop.ha.TestZKFailoverController.testGracefulFailoverMultipleZKfcs(TestZKFailoverController.java:586)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:53) at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7876) Workaround ZipInputStream limitation for YARN-2185

2018-02-01 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-7876:


 Summary: Workaround ZipInputStream limitation for YARN-2185
 Key: YARN-7876
 URL: https://issues.apache.org/jira/browse/YARN-7876
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


YARN-2185 added the ability to localize jar files as a stream instead of 
copying to local disk and then extracting. ZipInputStream does not need the end 
of the file. Let's read it out. This helps with an additional TeeInputStream on 
the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8039) Clean up log dir configuration in TestLinuxContainerExecutorWithMocks.testStartLocalizer

2018-03-16 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-8039:


 Summary: Clean up log dir configuration in 
TestLinuxContainerExecutorWithMocks.testStartLocalizer
 Key: YARN-8039
 URL: https://issues.apache.org/jira/browse/YARN-8039
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8090) Race conditions in FadvisedChunkedFile

2018-03-29 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-8090:


 Summary: Race conditions in FadvisedChunkedFile
 Key: YARN-8090
 URL: https://issues.apache.org/jira/browse/YARN-8090
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.0
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


{code:java}
11:04:33.605 AM WARNFadvisedChunkedFile 
Failed to manage OS cache for 
/var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
Method)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at 
org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76)
at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303)
at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163)
at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192)
at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
at 
org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8244) ContainersLauncher.ContainerLaunch can throw ConcurrentModificationException

2018-05-03 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-8244:


 Summary: ContainersLauncher.ContainerLaunch can throw 
ConcurrentModificationException
 Key: YARN-8244
 URL: https://issues.apache.org/jira/browse/YARN-8244
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


{code:java}
2018-05-03 17:31:35,028 WARN [ContainersLauncher #1] launcher.ContainerLaunch 
(ContainerLaunch.java:call(329)) - Failed to launch container.
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$EntryIterator.next(HashMap.java:1471)
at java.util.HashMap$EntryIterator.next(HashMap.java:1469)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch$ShellScriptBuilder.orderEnvByDependencies(ContainerLaunch.java:1311)
at 
org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.writeLaunchEnv(ContainerExecutor.java:388)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:290)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8262) get_executable in container-executor should provide meaningful error codes

2018-05-08 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-8262:


 Summary: get_executable in container-executor should provide 
meaningful error codes
 Key: YARN-8262
 URL: https://issues.apache.org/jira/browse/YARN-8262
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


Currently it calls exit(-1) that makes it difficult to debug without stderr.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-1014) Configure OOM Killer to kill OPPORTUNISTIC containers first

2018-06-06 Thread Miklos Szegedi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi resolved YARN-1014.
--
Resolution: Won't Fix

> Configure OOM Killer to kill OPPORTUNISTIC containers first
> ---
>
> Key: YARN-1014
> URL: https://issues.apache.org/jira/browse/YARN-1014
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Arun C Murthy
>Priority: Major
> Attachments: YARN-1014.00.patch, YARN-1014.01.patch, 
> YARN-1014.02.patch
>
>
> YARN-2882 introduces the notion of OPPORTUNISTIC containers. These containers 
> should be killed first should the system run out of memory. 
> -
> Previous description:
> Once RM allocates 'speculative containers' we need to get LCE to schedule 
> them at lower priorities via cgroups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8437) Build oom-listener on older versions

2018-06-19 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-8437:


 Summary: Build oom-listener on older versions
 Key: YARN-8437
 URL: https://issues.apache.org/jira/browse/YARN-8437
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi


oom-listener was introduced in YARN-4599. We have seen some build issues on 
centos6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8452) FairScheduler.update can take long time if yarn.scheduler.fair.sizebasedweight is on

2018-06-22 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-8452:


 Summary: FairScheduler.update can take long time if 
yarn.scheduler.fair.sizebasedweight is on
 Key: YARN-8452
 URL: https://issues.apache.org/jira/browse/YARN-8452
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Miklos Szegedi
Assignee: Szilard Nemeth


Basically we recalculate the weight every time, even if the inputs did not 
change. This causes high cpu usage, if the cluster has lots of apps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8470) Fair scheduler exception with SLS

2018-06-27 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-8470:


 Summary: Fair scheduler exception with SLS
 Key: YARN-8470
 URL: https://issues.apache.org/jira/browse/YARN-8470
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


I ran into the following exception with sls:
2018-06-26 13:34:04,358 ERROR resourcemanager.ResourceManager: Received 
RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10050) NodeManagerCGroupsMemory.md does not show up in the official documentation

2019-12-19 Thread Miklos Szegedi (Jira)
Miklos Szegedi created YARN-10050:
-

 Summary: NodeManagerCGroupsMemory.md does not show up in the 
official documentation
 Key: YARN-10050
 URL: https://issues.apache.org/jira/browse/YARN-10050
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


I looked at this doc:

[https://github.com/apache/hadoop/blob/9636fe4114eed9035cdc80108a026c657cd196d9/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/NodeManagerCGroupsMemory.md]

It does not show up here:

[https://hadoop.apache.org/docs/stable/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org