[jira] [Resolved] (YARN-9767) PartitionQueueMetrics Issues

2020-06-01 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YARN-9767.

Resolution: Fixed

> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9767.001.patch
>
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resources usually gets decreased. When this happens, it use node 
> partition info. hence #b metrics has derived. 
>  
> Given this kind of situation, We will need to put some thoughts on achieving 
> the metrics correctly.
>  
> 2. Though the intent of this jira is to do Partition Queue Metrics, we would 
> like to retain the existing Queue Metrics for backward compatibility (as you 
> can see from jira's discussion).
> With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
> overridden either with some specific partition values or default partition 
> values. It could be vice - versa as well. For example, after the queues (say 
> queue A) has been initialised with some min and max cap and also with node 
> label's min and max cap, Queuemetrics (availableMB) for queue A return values 
> based on node label's cap config.
> I've been working on these observations to provide a fix and attached 
> .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
> availableVcores is correct (Please refer above #2 observation). Added more 
> asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
> #2 is working properly.
> Also one more thing to note is, user metrics for availableMB, availableVcores 
> at root queue was not there even before. Retained the same behaviour. User 
> metrics for availableMB, availableVcores is available only at child queue's 
> level and also with partitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10303) One yarn rest api example of yarn document is error

2020-06-01 Thread bright.zhou (Jira)
bright.zhou created YARN-10303:
--

 Summary: One yarn rest api example of yarn document is error
 Key: YARN-10303
 URL: https://issues.apache.org/jira/browse/YARN-10303
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.2.1, 3.1.1
Reporter: bright.zhou
 Attachments: image-2020-06-02-10-27-35-020.png

deSelects value should be resourceRequests

!image-2020-06-02-10-27-35-020.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Re: [NOTICE] Removal of protobuf classes from Hadoop Token's public APIs' signature

2020-06-01 Thread Akira Ajisaka
> Please check https://issues.apache.org/jira/browse/HADOOP-17046
> This Jira proposes to keep existing ProtobuRpcEngine as-is (without
shading and with protobuf-2.5.0 implementation) to support downstream
implementations.

Thank you, Vinay. I checked the PR and it mostly looks good.
How do we proceed with?

I suppose Hadoop 3.3.0 is blocked by this issue. Is it true or not?

Thanks,
Akira

On Tue, May 19, 2020 at 2:06 AM Eric Yang  wrote:

> ProtobufHelper should not be a public API.  Hadoop uses protobuf
> serialization to expertise RPC performance with many drawbacks.  The
> generalized object usually require another indirection to map to usable
> Java object, this is making Hadoop code messy, and that is topic for
> another day.  The main challenges for UGI class is making the system
> difficult to secure.
>
> In Google's world, gRPC is built on top of protobuf + HTTP/2 binary
> protocol, and secured by JWT token with Google.  This means before
> deserializing a protobuf object on the wire, the call must deserialize a
> JSON token to determine if the call is authenticated before deserializing
> application objects.  Hence, using protobuf for RPC is no longer a good
> reason for performance gain over JSON because JWT token deserialization
> happens on every gRPC call to ensure the request is secured properly.
>
> In Hadoop world, we are not using JWT token for authentication, we have
> pluggable token implementation either SPNEGO, delegation token or some kind
> of SASL.  UGI class should not allow protobuf token to be exposed as public
> interface, otherwise down stream application can forge the protobuf token
> and it will become a privilege escalation issue.  In my opinion, UGI class
> must be as private as possible to prevent forgery.  Down stream application
> are discouraged from using UGI.doAs for impersonation to reduce privileges
> escalation.  Instead, the downstream application should running like Unix
> daemon instead of root.  This will ensure that vulnerability for one
> application does not spill over security problems to another application.
> Some people will disagree with the statement because existing application
> is already written to take advantage of UGI.doAs, such as Hive loading
> external table.  Fortunately, Hive provides an option to run without doAs.
>
> Protobuf is not suitable candidate for security token transport because it
> is a strong type transport.  If multiple tokens are transported with UGI
> protobuf, small difference in ASCII, UTF-8, or UTF-16 can cause
> conversion ambiguity that might create security holes or headache on type
> casting.  I am +1 on removing protobuf from Hadoop Token API.  Hadoop Token
> as byte array, and default to JSON serializer is probably simpler solution
> to keep the system robust without repeating the past mistakes.
>
> regards,
> Eric
>
> On Sun, May 17, 2020 at 11:56 PM Vinayakumar B 
> wrote:
>
> > Hi Wei-chu and steve,
> >
> > Thanks for sharing insights.
> >
> > I have also tried to compile and execute ozone pointing to
> > trunk(3.4.0-SNAPSHOT) which have shaded and upgraded protobuf.
> >
> > Other than just the usage of internal protobuf APIs, because of which
> > compilation would break, I found another major problem was, the
> Hadoop-rpc
> > implementations in downstreams which is based on non-shaded Protobuf
> > classes.
> >
> > 'ProtobufRpcEngine' takes arguments and tries to typecast to Protobuf
> > 'Message', which its expecting to be of 3.7 version and shaded package
> > (i.e. o.a.h.thirdparty.*).
> >
> > So,unless downstreams upgrade their protobuf classes to
> 'hadoop-thirdparty'
> > this issue will continue to occur, even after solving compilation issues
> > due to internal usage of private APIs with protobuf signatures.
> >
> > I found a possible workaround for this problem.
> > Please check https://issues.apache.org/jira/browse/HADOOP-17046
> >   This Jira proposes to keep existing ProtobuRpcEngine as-is (without
> > shading and with protobuf-2.5.0 implementation) to support downstream
> > implementations.
> >   Use new ProtobufRpcEngine2 to use shaded protobuf classes within Hadoop
> > and later projects who wish to upgrade their protobufs to 3.x.
> >
> > For Ozone compilation:
> >   I have submitted to PRs to make preparations to adopt to Hadoop 3.3+
> > upgrade. These PRs will remove dependency on Hadoop for those internal
> APIs
> > and implemented their own copy in ozone with non-shaded protobuf.
> > HDDS-3603: https://github.com/apache/hadoop-ozone/pull/93
> > 2
> > HDDS-3604: https://github.com/apache/hadoop-ozone/pull/933
> >
> > Also, I had run some tests on Ozone after applying these PRs and
> > HADOOP-17046 with 3.4.0, tests seems to pass.
> >
> > Please help review these PRs.
> >
> > Thanks,
> > -Vinay
> >
> >
> > On Wed, Apr 29, 2020 at 5:02 PM Steve Loughran
>  > >
> > wrote:
> >
> > > Okay.
> > >
> > > I am not going to be a purist and say 

[jira] [Created] (YARN-10302) Support custom packing algorithm for FairScheduler

2020-06-01 Thread William W. Graham Jr (Jira)
William W. Graham Jr created YARN-10302:
---

 Summary: Support custom packing algorithm for FairScheduler
 Key: YARN-10302
 URL: https://issues.apache.org/jira/browse/YARN-10302
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: William W. Graham Jr


The {{FairScheduler}} class allocates containers to nodes based on the node 
with the most available memory[0]. Create the ability to instead configure a 
custom packing algorithm with different logic. For instance for effective auto 
scaling, a bin packing algorithm might be a better choice.

0 - 
https://github.com/apache/hadoop/blob/56b7571131b0af03b32bf1c5673c32634652df21/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L1034-L1043



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-10301) "DIGEST-MD5: digest response format violation. Mismatched response." when network partition occurs

2020-06-01 Thread YCozy (Jira)
YCozy created YARN-10301:


 Summary: "DIGEST-MD5: digest response format violation. Mismatched 
response." when network partition occurs
 Key: YARN-10301
 URL: https://issues.apache.org/jira/browse/YARN-10301
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: YCozy


We observed the "Mismatched response." error in RM's log when a NM gets 
network-partitioned after RM failover. Here's how it happens:

 

Initially, we have a sleeper YARN service running in a cluster with two RMs (an 
active RM1 and a standby RM2) and one NM. At some point, we perform a RM 
failover from RM1 to RM2.

RM1's log:

 
{noformat}
2020-06-01 16:29:20,387 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to 
standby state{noformat}
RM2's log:

 

 
{noformat}
2020-06-01 16:29:27,818 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to 
active state{noformat}
 

After the RM failover, the NM encounters a network partition and fails to 
register with RM2. In other words, there's no "NodeManager from node *** 
registered" in RM2's log.

 

This does not affect the sleeper YARN service. The sleeper service successfully 
recovers after the RM failover. We can see in RM2's log:

 
{noformat}
2020-06-01 16:30:06,703 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
appattempt_6_0001_01 State change from LAUNCHED to RUNNING on event = 
REGISTERED{noformat}
 

Then, we stop the sleeper service. In RM2's log, we can see that:

 
{noformat}
2020-06-01 16:30:12,157 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
application_6_0001 unregistered successfully.
...
2020-06-01 16:31:09,861 INFO org.apache.hadoop.yarn.service.webapp.ApiServer: 
Successfully stopped service sleeper1{noformat}
And in AM's log, we can see that:

 

 
{noformat}
2020-06-01 16:30:12,651 [shutdown-hook-0] INFO  service.ServiceMaster - 
SHUTDOWN_MSG:{noformat}
 

Some time later, we observe the "Mismatched response" in RM2's log:
{noformat}
2020-06-01 16:43:20,699 WARN org.apache.hadoop.ipc.Client: Exception 
encountered while connecting to the server 
org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): 
DIGEST-MD5: digest response format violation. Mismatched response.
  at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:376)
  at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623)
  at org.apache.hadoop.ipc.Client$Connection.access$2400(Client.java:414)       
 
  at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:827)             
 
  at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:823)             
 
  at java.security.AccessController.doPrivileged(Native Method)                 
 
  at javax.security.auth.Subject.doAs(Subject.java:422)                         
 
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
  at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:823)    
 
  at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)       
 
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:1667)               
 
  at org.apache.hadoop.ipc.Client.call(Client.java:1483)                        
 
  at org.apache.hadoop.ipc.Client.call(Client.java:1436)                        
 
  at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
  at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
  at com.sun.proxy.$Proxy102.stopContainers(Unknown Source)                     
 
  at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.stopContainers(ContainerManagementProtocolPBClientImpl.java:147)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                
 
  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)                           
 
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
  at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
  at com.sun.proxy.$Proxy103.stopContainers(Unknown Source)                     
 
  at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.cleanup(AMLauncher.java:153)
  at 

[jira] [Created] (YARN-10300) appMasterHost not set in RM ApplicationSummary when AM fails before first heartbeat

2020-06-01 Thread Eric Badger (Jira)
Eric Badger created YARN-10300:
--

 Summary: appMasterHost not set in RM ApplicationSummary when AM 
fails before first heartbeat
 Key: YARN-10300
 URL: https://issues.apache.org/jira/browse/YARN-10300
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eric Badger
Assignee: Eric Badger


{noformat}
2020-05-23 14:09:10,086 INFO resourcemanager.RMAppManager$ApplicationSummary: 
appId=application_1586003420099_12444961,name=job_name,user=username,queue=queuename,state=FAILED,trackingUrl=https
 
://cluster:port/applicationhistory/app/application_1586003420099_12444961,appMasterHost=N/A,startTime=1590241207309,finishTime=1590242950085,finalStatus=FAILED,memorySeconds=13750,vcoreSeconds=67,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=,applicationType=MAPREDUCE
{noformat}

{{appMasterHost=N/A}} should have the AM hostname instead of N/A



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-10290) Resourcemanager recover failed when fair scheduler queue acl changed

2020-06-01 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YARN-10290.
--
Resolution: Duplicate

This issue is fixed in YARN-7913.
That change fixes a number of issues around restores that fail.

The change was not backported to Hadoop 2.x

> Resourcemanager recover failed when fair scheduler queue acl changed
> 
>
> Key: YARN-10290
> URL: https://issues.apache.org/jira/browse/YARN-10290
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: yehuanhuan
>Priority: Blocker
>
> Resourcemanager recover failed when fair scheduler queue acl changed. Because 
> of queue acl changed, when recover the application (addApplication() in 
> fairscheduler) is rejected. Then recover the applicationAttempt 
> (addApplicationAttempt() in fairscheduler) get Application is null. This will 
> lead to two RM is at standby. Repeat as follows.
>  
> # user run a long running application.
> # change queue acl (aclSubmitApps) so that the user does not have permission.
> # restart the RM.
> {code:java}
> 2020-05-25 16:04:06,191 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Updating 
> application application_1590393162216_0005 with final state: FAILED
> 2020-05-25 16:04:06,192 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
> load/recover state
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:663)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1246)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1072)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1036)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:789)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:105)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:845)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:897)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:850)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:723)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:322)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:427)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1173)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:584)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:980)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1021)
> at 
> 

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2020-06-01 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/160/

[May 31, 2020 11:40:59 AM] (Ayush Saxena) HDFS-10792. 
RedundantEditLogInputStream should log caught exceptions. Contributed by 
Wei-Chiu Chuang.

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10289) spark on yarn execption

2020-06-01 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-10289.
---
Resolution: Invalid

> spark on yarn execption 
> 
>
> Key: YARN-10289
> URL: https://issues.apache.org/jira/browse/YARN-10289
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.0.3
> Environment: hadoop 3.0.0
>Reporter: huang xin
>Priority: Major
>
> i execute spark on yarn and get the issue like this:
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> prelaunch.out70Setting up env variables2_03? 
> Setting up job resources
> Launching container
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> stdout0(_1590115508504_0033_02_01Ωcontainer-localizer-syslog1842020-05-24
>  15:39:20,867 INFO [main] 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer:
>  Disk Validator: yarn.nodemanager.disk-validator is loaded.
> prelaunch.out70Setting up env variables
> Setting up job resources
> Launching container
> stderr333ERROR StatusLogger No log4j2 configuration file found. Using default 
> configuration: logging only errors to the console. Set system property 
> 'org.apache.logging.log4j.simplelog.StatusLogger.level' to TRACE to show 
> Log4j2 internal initialization logging.
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> prelaunch.out70Setting up env variables1_05? 
> Setting up job resources
> Launching container
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> prelaunch.out70Setting up env variables1_04? 
> Setting up job resources
> Launching container
> stderr96Error: Could not find or load main class 
> org.apache.spark.executor.CoarseGrainedExecutorBackend
> stdout0 
>  VERSION*(_1590115508504_0033_01_0none??data:BCFile.indexnone?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2.10+JDK7 on Linux/x86

2020-06-01 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86/703/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   module:hadoop-common-project/hadoop-minikdc 
   Possible null pointer dereference in 
org.apache.hadoop.minikdc.MiniKdc.delete(File) due to return value of called 
method Dereferenced at 
MiniKdc.java:org.apache.hadoop.minikdc.MiniKdc.delete(File) due to return value 
of called method Dereferenced at MiniKdc.java:[line 515] 

FindBugs :

   module:hadoop-common-project/hadoop-auth 
   
org.apache.hadoop.security.authentication.server.MultiSchemeAuthenticationHandler.authenticate(HttpServletRequest,
 HttpServletResponse) makes inefficient use of keySet iterator instead of 
entrySet iterator At MultiSchemeAuthenticationHandler.java:of keySet iterator 
instead of entrySet iterator At MultiSchemeAuthenticationHandler.java:[line 
192] 

FindBugs :

   module:hadoop-common-project/hadoop-common 
   org.apache.hadoop.crypto.CipherSuite.setUnknownValue(int) 
unconditionally sets the field unknownValue At CipherSuite.java:unknownValue At 
CipherSuite.java:[line 44] 
   org.apache.hadoop.crypto.CryptoProtocolVersion.setUnknownValue(int) 
unconditionally sets the field unknownValue At 
CryptoProtocolVersion.java:unknownValue At CryptoProtocolVersion.java:[line 67] 
   Possible null pointer dereference in 
org.apache.hadoop.fs.FileUtil.fullyDeleteOnExit(File) due to return value of 
called method Dereferenced at 
FileUtil.java:org.apache.hadoop.fs.FileUtil.fullyDeleteOnExit(File) due to 
return value of called method Dereferenced at FileUtil.java:[line 118] 
   Possible null pointer dereference in 
org.apache.hadoop.fs.RawLocalFileSystem.handleEmptyDstDirectoryOnWindows(Path, 
File, Path, File) due to return value of called method Dereferenced at 
RawLocalFileSystem.java:org.apache.hadoop.fs.RawLocalFileSystem.handleEmptyDstDirectoryOnWindows(Path,
 File, Path, File) due to return value of called method Dereferenced at 
RawLocalFileSystem.java:[line 383] 
   Useless condition:lazyPersist == true at this point At 
CommandWithDestination.java:[line 502] 
   org.apache.hadoop.io.DoubleWritable.compareTo(DoubleWritable) 
incorrectly handles double value At DoubleWritable.java: At 
DoubleWritable.java:[line 78] 
   org.apache.hadoop.io.DoubleWritable$Comparator.compare(byte[], int, int, 
byte[], int, int) incorrectly handles double value At DoubleWritable.java:int) 
incorrectly handles double value At DoubleWritable.java:[line 97] 
   org.apache.hadoop.io.FloatWritable.compareTo(FloatWritable) incorrectly 
handles float value At FloatWritable.java: At FloatWritable.java:[line 71] 
   org.apache.hadoop.io.FloatWritable$Comparator.compare(byte[], int, int, 
byte[], int, int) incorrectly handles float value At FloatWritable.java:int) 
incorrectly handles float value At FloatWritable.java:[line 89] 
   Possible null pointer dereference in 
org.apache.hadoop.io.IOUtils.listDirectory(File, FilenameFilter) due to return 
value of called method Dereferenced at 
IOUtils.java:org.apache.hadoop.io.IOUtils.listDirectory(File, FilenameFilter) 
due to return value of called method Dereferenced at IOUtils.java:[line 389] 
   Possible bad parsing of shift operation in 
org.apache.hadoop.io.file.tfile.Utils$Version.hashCode() At 
Utils.java:operation in 
org.apache.hadoop.io.file.tfile.Utils$Version.hashCode() At Utils.java:[line 
398] 
   
org.apache.hadoop.metrics2.lib.DefaultMetricsFactory.setInstance(MutableMetricsFactory)
 unconditionally sets the field mmfImpl At DefaultMetricsFactory.java:mmfImpl 
At DefaultMetricsFactory.java:[line 49] 
   
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.setMiniClusterMode(boolean) 
unconditionally sets the field miniClusterMode At 
DefaultMetricsSystem.java:miniClusterMode At DefaultMetricsSystem.java:[line 
92] 
   Useless object stored in variable seqOs of method 
org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.addOrUpdateToken(AbstractDelegationTokenIdentifier,
 AbstractDelegationTokenSecretManager$DelegationTokenInformation, boolean) At 
ZKDelegationTokenSecretManager.java:seqOs of method 

[jira] [Created] (YARN-10299) TimeLine Service V1.5 use levelDB as backend storage will crash when data scale amount to 100GB

2020-06-01 Thread aimahou (Jira)
aimahou created YARN-10299:
--

 Summary: TimeLine Service V1.5 use levelDB as backend storage will 
crash when data scale amount to 100GB
 Key: YARN-10299
 URL: https://issues.apache.org/jira/browse/YARN-10299
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineservice
Affects Versions: 3.1.1
Reporter: aimahou


h2. Issue:

TimeLine Service V1.5 use levelDB as backend storage will crash when data scale 
amount to 100GB
h2. *Specific exception:*

2020-04-24 16:06:59,914 INFO  
applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
(ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(691)) 
- No application attempt found for application_1587696012637_1143. Use a 
placeholder for its latest attempt id. 2020-04-24 16:06:59,914 INFO  
applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore 
(ApplicationHistoryManagerOnTimelineStore.java:generateApplicationReport(691)) 
- No application attempt found for application_1587696012637_1143. Use a 
placeholder for its latest attempt id. 
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: The 
entity for application attempt appattempt_1587696012637_1143_01 doesn't 
exist in the timeline store at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplicationAttempt(ApplicationHistoryManagerOnTimelineStore.java:183)
 at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.generateApplicationReport(ApplicationHistoryManagerOnTimelineStore.java:677)
 at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerOnTimelineStore.getApplications(ApplicationHistoryManagerOnTimelineStore.java:128)
 at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getApplications(ApplicationHistoryClientService.java:195)
 at 
org.apache.hadoop.yarn.server.webapp.AppsBlock.getApplicationReport(AppsBlock.java:129)
 at 
org.apache.hadoop.yarn.server.webapp.AppsBlock.fetchData(AppsBlock.java:114) at 
org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:137) at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) 
at org.apache.hadoop.yarn.webapp.View.render(View.java:243) at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117) 
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848) at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
 at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at 
org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:206) at 
org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:165) at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
 at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
 at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182) 
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
 at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
 at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
 at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
 at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
 at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
 at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
 at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133) at 
com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130) at 
com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203) at 
com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130) at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
 at 
org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
 at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
 at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644)
 at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304)
 at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592)
 at 

[jira] [Created] (YARN-10298) TimeLine entity information only stored in one region when use apache HBase as backend storage

2020-06-01 Thread aimahou (Jira)
aimahou created YARN-10298:
--

 Summary: TimeLine entity information only stored in one region 
when use apache HBase as backend storage
 Key: YARN-10298
 URL: https://issues.apache.org/jira/browse/YARN-10298
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: ATSv2, timelineservice
Affects Versions: 3.1.1
Reporter: aimahou


h2. Issue

TimeLine entity information only stored in one region when use apache HBase as 
backend storage
h2. Probable cause

We found in the source code that the rowKey is composed of 
clusterId、userId、flowName、flowRunId and appId when hbase timeline writer stores 
timeline entity info,which probably cause the rowKey is sorted by dictionary 
order. Thus timeline entity may only store in one region or few adjacent 
regions.
h2. Related code snippet

HBaseTimelineWriterImpl.java

public TimelineWriteResponse write(TimelineCollectorContext context,
 TimelineEntities data, UserGroupInformation callerUgi)
 throws IOException {

...

boolean isApplication = ApplicationEntity.isApplicationEntity(te);
byte[] rowKey;
if (isApplication) {
 ApplicationRowKey applicationRowKey =
 new ApplicationRowKey(clusterId, userId, flowName, flowRunId,
 appId);
 rowKey = applicationRowKey.getRowKey();
 store(rowKey, te, flowVersion, Tables.APPLICATION_TABLE);
} else {
 EntityRowKey entityRowKey =
 new EntityRowKey(clusterId, userId, flowName, flowRunId, appId,
 te.getType(), te.getIdPrefix(), te.getId());
 rowKey = entityRowKey.getRowKey();
 store(rowKey, te, flowVersion, Tables.ENTITY_TABLE);
}

if (!isApplication && SubApplicationEntity.isSubApplicationEntity(te)) {
 SubApplicationRowKey subApplicationRowKey =
 new SubApplicationRowKey(subApplicationUser, clusterId,
 te.getType(), te.getIdPrefix(), te.getId(), userId);
 rowKey = subApplicationRowKey.getRowKey();
 store(rowKey, te, flowVersion, Tables.SUBAPPLICATION_TABLE);
}

...

}
h2. Suggestion

We can use the hash code of original rowKey as the rowKey to store and read 
timeline entity data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org