[jira] [Created] (YARN-9994) rumen2sls.sh cannot find class RumenToSLSConverter

2019-11-27 Thread Shen Yinjie (Jira)
Shen Yinjie created YARN-9994:
-

 Summary: rumen2sls.sh cannot find class RumenToSLSConverter
 Key: YARN-9994
 URL: https://issues.apache.org/jira/browse/YARN-9994
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler-load-simulator
Affects Versions: 3.2.1, 3.1.0
Reporter: Shen Yinjie


run rumen2sls.sh returns {code:java}Error: Could not find or load main class 
org.apache.hadoop.yarn.sls.RumenToSLSConverter{code}.
rumen2sls.sh should  add hadoop-sls to classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8688) Duplicate queue names in fair scheduler allocation file

2019-06-30 Thread Shen Yinjie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie resolved YARN-8688.
---
Resolution: Duplicate
  Assignee: Shen Yinjie

> Duplicate queue names in fair scheduler  allocation file
> 
>
> Key: YARN-8688
> URL: https://issues.apache.org/jira/browse/YARN-8688
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.2, 3.1.0
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
>
> when config++ duplicate queue names in fair scheduler  allocation file, RM 
> cannot  recognized the error even if restart RM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9586) [QA] Need more doc for yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used

2019-05-28 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-9586:
-

 Summary: [QA] Need more doc for 
yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used
 Key: YARN-9586
 URL: https://issues.apache.org/jira/browse/YARN-9586
 Project: Hadoop YARN
  Issue Type: Wish
  Components: federation
Reporter: Shen Yinjie


We picked LoadBasedRouterPolicy for YARN federation, but had no idea what to 
 set to yarn.federation.policy-manager-params. Is there a demo config or more 
detailed description for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9577) YARN router should expose SubClusters infomation throuth RouterWebServices

2019-05-22 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-9577:
-

 Summary: YARN router should expose SubClusters infomation throuth 
RouterWebServices
 Key: YARN-9577
 URL: https://issues.apache.org/jira/browse/YARN-9577
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: router
Reporter: Shen Yinjie


When yarn federation is enabled, it is very helpful to have a way to access all 
subclusters Info through API , currently we can implement this in 
RouterWebServices.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9425) Make initialDelay configurable for FederationStateStoreService#scheduledExecutorService

2019-03-29 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-9425:
-

 Summary: Make initialDelay configurable for 
FederationStateStoreService#scheduledExecutorService
 Key: YARN-9425
 URL: https://issues.apache.org/jira/browse/YARN-9425
 Project: Hadoop YARN
  Issue Type: Bug
  Components: federation
Reporter: Shen Yinjie


When enable YARN federation, subclusters info in Router Web UI  cannot be 
loaded immediately, and client cannot find any active subclusters after 5mins 
by default ,which is configured by 
"yarn.federation.state-store.heartbeat-interval-secs".
IMA,we should seperate 'initialDely' and 'delay' for 
FederationStateStoreService#scheduledExecutorService.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()

2019-03-28 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-9424:
-

 Summary: Change getDeclaredMethods to getMethods in 
FederationClientInterceptor#invokeConcurrent()
 Key: YARN-9424
 URL: https://issues.apache.org/jira/browse/YARN-9424
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Shen Yinjie


In YARN-8699, FederationClientInterceptor#invokeConcurrent uses 
getDeclaredMethods(), which cannot recongnize some methods in 
ApplicationBaseProtocol (ApplicationClientProtocol extend 
ApplicationBaseProtocol) ,for example getApplications, when I run "yarn 
application -list" by connecting to yarn router, it will throw exception.
So change getDeclaredMethods to getMethods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8979) Spark on yarn job failed with yarn federation enabled

2018-11-06 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-8979:
-

 Summary: Spark on yarn job failed  with yarn federation enabled
 Key: YARN-8979
 URL: https://issues.apache.org/jira/browse/YARN-8979
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.1.0
Reporter: Shen Yinjie


when I ran spark job on yarn with yarn federation enabled,job failed and throw 
Exception as:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8688) Duplicate queue names in fair scheduler allocation file

2018-08-20 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-8688:
-

 Summary: Duplicate queue names in fair scheduler  allocation file
 Key: YARN-8688
 URL: https://issues.apache.org/jira/browse/YARN-8688
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.1.0, 2.8.2
Reporter: Shen Yinjie


when config++ duplicate queue names in fair scheduler  allocation file, RM 
cannot  recognized the error even if restart RM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8602) make capacity-scheduler.xml file configurable in yarn-site?

2018-07-29 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-8602:
-

 Summary: make capacity-scheduler.xml file configurable in 
yarn-site?
 Key: YARN-8602
 URL: https://issues.apache.org/jira/browse/YARN-8602
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Shen Yinjie


Like Fair Scheduler?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8539) TimelineWebService#getUser from HttpServletRequest may be null

2018-07-15 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-8539:
-

 Summary: TimelineWebService#getUser from HttpServletRequest may be 
null
 Key: YARN-8539
 URL: https://issues.apache.org/jira/browse/YARN-8539
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineservice
Reporter: Shen Yinjie


When we integrate tez-ui with timeline server and set yarn.acl.enabled=true. 
tez-ui will invoke the timeline rest ** interface(ws/v1/timeline/TEZ_DAG_ID) to 
get all dags . But tez-ui shows "no records available" .

after some digging, I find when tez-ui invoke ".../ws/v1/timeline/TEZ_DAG_ID". 
TimelineWebService#getUser(HttpServletRequest req) returns callerUgi = null

In TimelineACLsManager#checkAccess()
{code:java}
..
if (callerUGI != null
&& (adminAclsManager.isAdmin(callerUGI) ||
callerUGI.getShortUserName().equals(owner) ||
domainACL.isUserAllowed(callerUGI))) {
return true;
}
return false;
}
{code}
Finally, Tez ui get nothing because of couldn't pass this checkAccess().

I also refer to the similar code in RMWebServices

{code} protected Boolean hasAccess(RMApp app, HttpServletRequest hsr) {
 // Check for the authorization.
 UserGroupInformation callerUGI = getCallerUserGroupInformation(hsr, true);
..
 if (callerUGI != null
 && !(this.rm.getApplicationACLsManager().checkAccess(callerUGI,
 ApplicationAccessType.VIEW_APP, app.getUser(),
 app.getApplicationId())
 || this.rm.getQueueACLsManager().checkAccess(callerUGI,
 QueueACL.ADMINISTER_QUEUE, app, hsr.getRemoteAddr(),
 forwardedAddresses))) {
 return false;
 }
 return true;
 }

{code}

 

when callerUgi= null, hasAcces() returns true.

So , I made a similar fix for TimelineWebServices.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-8180) YARN Federation has not implemented blacklist sub-cluster for AM routing

2018-04-19 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-8180:
-

 Summary: YARN Federation has not implemented blacklist sub-cluster 
for AM routing
 Key: YARN-8180
 URL: https://issues.apache.org/jira/browse/YARN-8180
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Shen Yinjie


Property "yarn.federation.blacklist-subclusters" is defined in yarn-fedeartion 
doc,but it has not been implemented in code.

In FederationClientInteerceptor#submitApplication()
{code:java}

List blacklist = new ArrayList();

for (int i = 0; i < numSubmitRetries; ++i) {

SubClusterId subClusterId = policyFacade.getHomeSubcluster(
request.getApplicationSubmissionContext(), blacklist);
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7975) Add an optional arg to yarn cluster -list-node-labels to list all nodes collection partitioned by labels

2018-02-26 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-7975:
-

 Summary: Add an optional arg to yarn cluster -list-node-labels to 
list all nodes collection partitioned by labels
 Key: YARN-7975
 URL: https://issues.apache.org/jira/browse/YARN-7975
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Shen Yinjie


Since we have "yarn cluster -lnl" to print all nodelabels info .But it's not 
enough,we should be abale to list nodes collection partitioned by 
labels,especially in large cluster.

So  I propose to add an optional argument  "-nodes" for  "yarn cluster -lnl" to 
achieve this.

e.g.

[yarn@docker1 ~]$ yarn cluster -lnl -nodes
Node Labels Num: 3
              Labels                                               Nodes
 

[jira] [Resolved] (YARN-7425) Failed to renew delegation token when RM user's TGT is expired

2017-11-08 Thread Shen Yinjie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie resolved YARN-7425.
---
Resolution: Won't Fix

> Failed to renew delegation token  when RM user's TGT is expired
> ---
>
> Key: YARN-7425
> URL: https://issues.apache.org/jira/browse/YARN-7425
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.2
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Critical
> Attachments: rm_log.png
>
>
> we have a secure hadoop cluster with namenode federation.
> submit job fails after kerberos TGT maxLifeTime expired(default 24h), client 
> log shows" failed to renew token: HDFS_DELEGATION_TOKEN...".
> check rm log, found rm tgt is expired but not triggers relogin(),just retry 
> and fail...
> (rm log see screenshot)
> digging in code:
> when rm tries to renewToken(),
> UserGroupInformation.getLoginUser()="rm",
> but UserGroupInformation.getCurrentUser()="testUser".
> this causes Client.shouldAuthenticateOverKrb() returns false, thus cant 
> trigger reloginFromKeytab() or reloginFromTicketCache().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-7425) Failed to renew delegation token when RM user's TGT is expired

2017-11-01 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-7425:
-

 Summary: Failed to renew delegation token  when RM user's TGT is 
expired
 Key: YARN-7425
 URL: https://issues.apache.org/jira/browse/YARN-7425
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.2
Reporter: Shen Yinjie
Priority: Critical


we have a secure hadoop cluster with namenode federation.
submit job fails after kerberos TGT maxLifeTime expired(default 24h), client 
log shows" failed to renew token: HDFS_DELEGATION_TOKEN...".
check rm log, found rm tgt is expired but not triggers relogin(),just retry and 
fail...
(some logs see screenshots)
digging in code:
when rm tries to renewToken(),
UserGroupInformation.getLoginUser()="rm",
but UserGroupInformation.getCurrentUser()="testUser".
this causes Client.shouldAuthenticateOverKrb() returns false, thus cant trigger 
reloginFromKeytab() or reloginFromTicketCache().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6462) Add yarn command to list all queues

2017-04-10 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-6462:
-

 Summary: Add yarn command to list all queues
 Key: YARN-6462
 URL: https://issues.apache.org/jira/browse/YARN-6462
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Shen Yinjie


we need a yarn command to list all queues ,as already has this kind of command 
for applications and nodemangers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



[jira] [Created] (YARN-6006) Log aggregation causes nodemanager OOM

2016-12-14 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-6006:
-

 Summary: Log aggregation causes nodemanager OOM
 Key: YARN-6006
 URL: https://issues.apache.org/jira/browse/YARN-6006
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Shen Yinjie


log aggregation is enabled, nodemanager died with oom exception. exception as 
sreenshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org