Re: Mapreduce application failed on Distributed scheduler on YARN.

2017-06-16 Thread Arun Suresh
Hello Jasson

We probably have to update the documentation, but when using distributed
scheduling, the mapreduce task must be submitted with the following extra
configuration:

-Dyarn.resourcemanager.scheduler.address=127.0.0.1:8049

Essentially, the AM should talk to the DistributedScheduler interceptor
running on the local NM.

Let me know if that works.

Cheers
-Arun


On Wed, Jun 14, 2017 at 12:18 AM, Konstantinos Karanasos <
kkarana...@gmail.com> wrote:

> Hi Wei,
>
> Can you please share your yarn-site.xml and the command you used for
> running wordcount?
>
> Thanks,
> Konstantinos
>
> On Tue, Jun 13, 2017 at 11:44 Jasson Chenwei 
> wrote:
>
> > hi all
> >
> > I have set up a distributed scheduler using a new feature in Hadoop-3.0.
> My
> > Hadoop version is hadoop-3.0.0-appha3. I have enabled the
> > opportunistic container and distributed scheduler in
> > yarn-site.xml following the guide. But wordcount application master
> failed
> > to launch as follow:
> >
> > *2017-06-13 12:34:11,036 INFO [main] org.eclipse.jetty.server.Server:
> > Started @5116ms*
> > *2017-06-13 12:34:11,036 INFO [main] org.apache.hadoop.yarn.webapp.
> WebApps:
> > Web app mapreduce started at 45559*
> > *2017-06-13 12:34:11,039 INFO [AsyncDispatcher event handler]
> > org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator:
> JOB_CREATE
> > job_1497050650910_0004*
> > *2017-06-13 12:34:11,041 INFO [main]
> > org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class
> > java.util.concurrent.LinkedBlockingQueue queueCapacity: 3000 scheduler:
> > class org.apache.hadoop.ipc.DefaultRpcScheduler*
> > *2017-06-13 12:34:11,042 INFO [Socket Reader #1 for port 36026]
> > org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 36026*
> > *2017-06-13 12:34:11,045 INFO [IPC Server Responder]
> > org.apache.hadoop.ipc.Server: IPC Server Responder: starting*
> > *2017-06-13 12:34:11,045 INFO [IPC Server listener on 36026]
> > org.apache.hadoop.ipc.Server: IPC Server listener on 36026: starting*
> > *2017-06-13 12:34:11,083 INFO [main]
> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
> > nodeBlacklistingEnabled:true*
> > *2017-06-13 12:34:11,083 INFO [main]
> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
> > maxTaskFailuresPerNode is 3*
> > *2017-06-13 12:34:11,083 INFO [main]
> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
> > blacklistDisablePercent is 33*
> > *2017-06-13 12:34:11,090 INFO [main]
> > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 0% of the
> > mappers will be scheduled using OPPORTUNISTIC containers*
> > *2017-06-13 12:34:11,132 INFO [main] org.apache.hadoop.yarn.client.
> RMProxy:
> > Connecting to ResourceManager at localhost/127.0.0.1:8030
> > *
> > *2017-06-13 12:34:11,193 WARN [main] org.apache.hadoop.ipc.Client:
> > Exception encountered while connecting to the server :
> >
> > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.
> SecretManager$InvalidToken):
> > Invalid AMRMToken from appattempt_1497050650910_0004_02*
> > *2017-06-13 12:34:11,203 ERROR [main]
> > org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: Exception while
> > registering*
> > *org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid
> > AMRMToken from appattempt_1497050650910_0004_02*
> > * at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)*
> > * at
> >
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)*
> > * at
> >
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)*
> > * at java.lang.reflect.Constructor.newInstance(Constructor.java:423)*
> > * at
> > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(
> RPCUtil.java:53)*
> > * at
> > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(
> RPCUtil.java:80)*
> > * at
> >
> > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(
> RPCUtil.java:119)*
> > * at
> >
> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBCli
> entImpl.registerApplicationMaster(ApplicationMasterProtocolPBCli
> entImpl.java:109)*
> > * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
> > * at
> >
> > sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)*
> > * at
> >
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)*
> > * at java.lang.reflect.Method.invoke(Method.java:498)*
> > * at
> >
> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(
> RetryInvocationHandler.java:398)*
> > * at
> >
> > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(
> RetryInvocationHandler.java:163)*
> > * at
> >
> > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.
> invoke(RetryInvocationHandler.java:155)*
> > * at
> >
> > 

[jira] [Created] (YARN-6717) [Umbrella] API related cleanup for Hadoop 3

2017-06-16 Thread Ray Chiang (JIRA)
Ray Chiang created YARN-6717:


 Summary: [Umbrella] API related cleanup for Hadoop 3
 Key: YARN-6717
 URL: https://issues.apache.org/jira/browse/YARN-6717
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Ray Chiang
Assignee: Ray Chiang


Creating this umbrella JIRA for tracking various API related issues that need 
to be properly tracked, adjusted, or documented before Hadoop 3 release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/ppc64le

2017-06-16 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/347/

[Jun 15, 2017 1:49:34 PM] (stevel) HADOOP-14506. Add create() contract test 
that verifies ancestor dir
[Jun 15, 2017 5:40:59 PM] (xiao) HADOOP-14523. 
OpensslAesCtrCryptoCodec.finalize() holds excessive
[Jun 15, 2017 5:59:24 PM] (lei) HADOOP-14394. Provide Builder pattern for 
DistributedFileSystem.create.
[Jun 15, 2017 6:04:50 PM] (lei) HDFS-11682. 
TestBalancer.testBalancerWithStripedFile is flaky. (lei)
[Jun 15, 2017 6:16:16 PM] (liuml07) HADOOP-14494. 
ITestJets3tNativeS3FileSystemContract tests NPEs in
[Jun 15, 2017 9:46:55 PM] (wang) HDFS-10480. Add an admin command to list 
currently open files.
[Jun 16, 2017 4:17:10 AM] (aajisaka) HADOOP-14289. Move log4j APIs over to 
slf4j in hadoop-common.
[Jun 16, 2017 8:45:22 AM] (stevel) HADOOP-14486 
TestSFTPFileSystem#testGetAccessTime test failure using




-1 overall


The following subsystems voted -1:
compile mvninstall unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc javac


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 
   hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer 
   hadoop.hdfs.server.namenode.TestReconstructStripedBlocks 
   hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.yarn.server.nodemanager.recovery.TestNMLeveldbStateStoreService 
   hadoop.yarn.server.nodemanager.TestNodeManagerShutdown 
   hadoop.yarn.server.timeline.TestRollingLevelDB 
   hadoop.yarn.server.timeline.TestTimelineDataManager 
   hadoop.yarn.server.timeline.TestLeveldbTimelineStore 
   hadoop.yarn.server.timeline.recovery.TestLeveldbTimelineStateStore 
   hadoop.yarn.server.timeline.TestRollingLevelDBTimelineStore 
   
hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer 
   hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector 
   hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer 
   
hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling 
   hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore 
   hadoop.yarn.server.TestMiniYarnClusterNodeUtilization 
   hadoop.yarn.server.TestContainerManagerSecurity 
   hadoop.yarn.client.api.impl.TestNMClient 
   hadoop.yarn.server.timeline.TestLevelDBCacheTimelineStore 
   hadoop.yarn.server.timeline.TestOverrideTimelineStoreYarnClient 
   hadoop.yarn.server.timeline.TestEntityGroupFSTimelineStore 
   hadoop.yarn.applications.distributedshell.TestDistributedShell 
   hadoop.mapred.TestShuffleHandler 
   hadoop.mapreduce.v2.hs.TestHistoryServerLeveldbStateStoreService 

Timed out junit tests :

   org.apache.hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean 
   org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache 
   org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands 
   
org.apache.hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA 
   
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
   
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA 
   org.apache.hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels 
   
org.apache.hadoop.yarn.client.api.impl.TestOpportunisticContainerAllocation 
  

   mvninstall:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/347/artifact/out/patch-mvninstall-root.txt
  [500K]

   compile:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/347/artifact/out/patch-compile-root.txt
  [20K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/347/artifact/out/patch-compile-root.txt
  [20K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/347/artifact/out/patch-compile-root.txt
  [20K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/347/artifact/out/patch-unit-hadoop-assemblies.txt
  [4.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/347/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [496K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/347/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [56K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-ppc/347/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt

[jira] [Created] (YARN-6716) Native services support for specifying component start order

2017-06-16 Thread Billie Rinaldi (JIRA)
Billie Rinaldi created YARN-6716:


 Summary: Native services support for specifying component start 
order
 Key: YARN-6716
 URL: https://issues.apache.org/jira/browse/YARN-6716
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: yarn-native-services
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi


Some native services apps have components that should be started after other 
components. The readiness_check and dependencies features of the native 
services API are currently unimplemented, and we could use these to implement a 
basic start order feature. When component B has a dependency on component A, 
the AM could delay making a container request for component B until component 
A's readiness check has passed (for all instances of component A).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6715) NodeHealthScriptRunner does not handle non-zero exit codes properly

2017-06-16 Thread Peter Bacsko (JIRA)
Peter Bacsko created YARN-6715:
--

 Summary: NodeHealthScriptRunner does not handle non-zero exit 
codes properly
 Key: YARN-6715
 URL: https://issues.apache.org/jira/browse/YARN-6715
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Peter Bacsko


There is a bug in NodeHealthScriptRunner. The {{FAILED_WITH_EXIT_CODE}} case is 
incorrect:

{noformat}
void reportHealthStatus(HealthCheckerExitStatus status) {
  long now = System.currentTimeMillis();
  switch (status) {
  case SUCCESS:
setHealthStatus(true, "", now);
break;
  case TIMED_OUT:
setHealthStatus(false, NODE_HEALTH_SCRIPT_TIMED_OUT_MSG);
break;
  case FAILED_WITH_EXCEPTION:
setHealthStatus(false, exceptionStackTrace);
break;
  case FAILED_WITH_EXIT_CODE:
setHealthStatus(true, "", now);
break;
  case FAILED:
setHealthStatus(false, shexec.getOutput());
break;
  }
}
{noformat}

This case also lacks unit test coverage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-06-16 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/436/

[Jun 15, 2017 1:49:34 PM] (stevel) HADOOP-14506. Add create() contract test 
that verifies ancestor dir
[Jun 15, 2017 5:40:59 PM] (xiao) HADOOP-14523. 
OpensslAesCtrCryptoCodec.finalize() holds excessive
[Jun 15, 2017 5:59:24 PM] (lei) HADOOP-14394. Provide Builder pattern for 
DistributedFileSystem.create.
[Jun 15, 2017 6:04:50 PM] (lei) HDFS-11682. 
TestBalancer.testBalancerWithStripedFile is flaky. (lei)
[Jun 15, 2017 6:16:16 PM] (liuml07) HADOOP-14494. 
ITestJets3tNativeS3FileSystemContract tests NPEs in
[Jun 15, 2017 9:46:55 PM] (wang) HDFS-10480. Add an admin command to list 
currently open files.
[Jun 16, 2017 4:17:10 AM] (aajisaka) HADOOP-14289. Move log4j APIs over to 
slf4j in hadoop-common.




-1 overall


The following subsystems voted -1:
findbugs unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   module:hadoop-common-project/hadoop-minikdc 
   Possible null pointer dereference in 
org.apache.hadoop.minikdc.MiniKdc.delete(File) due to return value of called 
method Dereferenced at 
MiniKdc.java:org.apache.hadoop.minikdc.MiniKdc.delete(File) due to return value 
of called method Dereferenced at MiniKdc.java:[line 368] 

FindBugs :

   module:hadoop-common-project/hadoop-auth 
   
org.apache.hadoop.security.authentication.server.MultiSchemeAuthenticationHandler.authenticate(HttpServletRequest,
 HttpServletResponse) makes inefficient use of keySet iterator instead of 
entrySet iterator At MultiSchemeAuthenticationHandler.java:of keySet iterator 
instead of entrySet iterator At MultiSchemeAuthenticationHandler.java:[line 
192] 

FindBugs :

   module:hadoop-common-project/hadoop-common 
   org.apache.hadoop.crypto.CipherSuite.setUnknownValue(int) 
unconditionally sets the field unknownValue At CipherSuite.java:unknownValue At 
CipherSuite.java:[line 44] 
   org.apache.hadoop.crypto.CryptoProtocolVersion.setUnknownValue(int) 
unconditionally sets the field unknownValue At 
CryptoProtocolVersion.java:unknownValue At CryptoProtocolVersion.java:[line 67] 
   Possible null pointer dereference in 
org.apache.hadoop.fs.FileUtil.fullyDeleteOnExit(File) due to return value of 
called method Dereferenced at 
FileUtil.java:org.apache.hadoop.fs.FileUtil.fullyDeleteOnExit(File) due to 
return value of called method Dereferenced at FileUtil.java:[line 118] 
   Possible null pointer dereference in 
org.apache.hadoop.fs.RawLocalFileSystem.handleEmptyDstDirectoryOnWindows(Path, 
File, Path, File) due to return value of called method Dereferenced at 
RawLocalFileSystem.java:org.apache.hadoop.fs.RawLocalFileSystem.handleEmptyDstDirectoryOnWindows(Path,
 File, Path, File) due to return value of called method Dereferenced at 
RawLocalFileSystem.java:[line 387] 
   Return value of org.apache.hadoop.fs.permission.FsAction.or(FsAction) 
ignored, but method has no side effect At FTPFileSystem.java:but method has no 
side effect At FTPFileSystem.java:[line 421] 
   Useless condition:lazyPersist == true at this point At 
CommandWithDestination.java:[line 502] 
   org.apache.hadoop.io.DoubleWritable.compareTo(DoubleWritable) 
incorrectly handles double value At DoubleWritable.java: At 
DoubleWritable.java:[line 78] 
   org.apache.hadoop.io.DoubleWritable$Comparator.compare(byte[], int, int, 
byte[], int, int) incorrectly handles double value At DoubleWritable.java:int) 
incorrectly handles double value At DoubleWritable.java:[line 97] 
   org.apache.hadoop.io.FloatWritable.compareTo(FloatWritable) incorrectly 
handles float value At FloatWritable.java: At FloatWritable.java:[line 71] 
   org.apache.hadoop.io.FloatWritable$Comparator.compare(byte[], int, int, 
byte[], int, int) incorrectly handles float value At FloatWritable.java:int) 
incorrectly handles float value At FloatWritable.java:[line 89] 
   Possible null pointer dereference in 
org.apache.hadoop.io.IOUtils.listDirectory(File, FilenameFilter) due to return 
value of called method Dereferenced at 
IOUtils.java:org.apache.hadoop.io.IOUtils.listDirectory(File, FilenameFilter) 
due to return value of called method Dereferenced at IOUtils.java:[line 351] 
   org.apache.hadoop.io.erasurecode.ECSchema.toString() makes inefficient 
use of keySet iterator instead of entrySet iterator At ECSchema.java:keySet 
iterator instead of entrySet iterator At ECSchema.java:[line 193] 
   Possible bad parsing of shift operation in 
org.apache.hadoop.io.file.tfile.Utils$Version.hashCode() At 
Utils.java:operation in 
org.apache.hadoop.io.file.tfile.Utils$Version.hashCode() At Utils.java:[line 
398]