[jira] [Created] (YARN-9994) rumen2sls.sh cannot find class RumenToSLSConverter
Shen Yinjie created YARN-9994: - Summary: rumen2sls.sh cannot find class RumenToSLSConverter Key: YARN-9994 URL: https://issues.apache.org/jira/browse/YARN-9994 Project: Hadoop YARN Issue Type: Bug Components: scheduler-load-simulator Affects Versions: 3.2.1, 3.1.0 Reporter: Shen Yinjie run rumen2sls.sh returns {code:java}Error: Could not find or load main class org.apache.hadoop.yarn.sls.RumenToSLSConverter{code}. rumen2sls.sh should add hadoop-sls to classpath. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8688) Duplicate queue names in fair scheduler allocation file
[ https://issues.apache.org/jira/browse/YARN-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie resolved YARN-8688. --- Resolution: Duplicate Assignee: Shen Yinjie > Duplicate queue names in fair scheduler allocation file > > > Key: YARN-8688 > URL: https://issues.apache.org/jira/browse/YARN-8688 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.2, 3.1.0 >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Major > > when config++ duplicate queue names in fair scheduler allocation file, RM > cannot recognized the error even if restart RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9586) [QA] Need more doc for yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used
Shen Yinjie created YARN-9586: - Summary: [QA] Need more doc for yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used Key: YARN-9586 URL: https://issues.apache.org/jira/browse/YARN-9586 Project: Hadoop YARN Issue Type: Wish Components: federation Reporter: Shen Yinjie We picked LoadBasedRouterPolicy for YARN federation, but had no idea what to set to yarn.federation.policy-manager-params. Is there a demo config or more detailed description for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9577) YARN router should expose SubClusters infomation throuth RouterWebServices
Shen Yinjie created YARN-9577: - Summary: YARN router should expose SubClusters infomation throuth RouterWebServices Key: YARN-9577 URL: https://issues.apache.org/jira/browse/YARN-9577 Project: Hadoop YARN Issue Type: Improvement Components: router Reporter: Shen Yinjie When yarn federation is enabled, it is very helpful to have a way to access all subclusters Info through API , currently we can implement this in RouterWebServices. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9425) Make initialDelay configurable for FederationStateStoreService#scheduledExecutorService
Shen Yinjie created YARN-9425: - Summary: Make initialDelay configurable for FederationStateStoreService#scheduledExecutorService Key: YARN-9425 URL: https://issues.apache.org/jira/browse/YARN-9425 Project: Hadoop YARN Issue Type: Bug Components: federation Reporter: Shen Yinjie When enable YARN federation, subclusters info in Router Web UI cannot be loaded immediately, and client cannot find any active subclusters after 5mins by default ,which is configured by "yarn.federation.state-store.heartbeat-interval-secs". IMA,we should seperate 'initialDely' and 'delay' for FederationStateStoreService#scheduledExecutorService. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-9424) Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent()
Shen Yinjie created YARN-9424: - Summary: Change getDeclaredMethods to getMethods in FederationClientInterceptor#invokeConcurrent() Key: YARN-9424 URL: https://issues.apache.org/jira/browse/YARN-9424 Project: Hadoop YARN Issue Type: Bug Reporter: Shen Yinjie In YARN-8699, FederationClientInterceptor#invokeConcurrent uses getDeclaredMethods(), which cannot recongnize some methods in ApplicationBaseProtocol (ApplicationClientProtocol extend ApplicationBaseProtocol) ,for example getApplications, when I run "yarn application -list" by connecting to yarn router, it will throw exception. So change getDeclaredMethods to getMethods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8979) Spark on yarn job failed with yarn federation enabled
Shen Yinjie created YARN-8979: - Summary: Spark on yarn job failed with yarn federation enabled Key: YARN-8979 URL: https://issues.apache.org/jira/browse/YARN-8979 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.1.0 Reporter: Shen Yinjie when I ran spark job on yarn with yarn federation enabled,job failed and throw Exception as: -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8688) Duplicate queue names in fair scheduler allocation file
Shen Yinjie created YARN-8688: - Summary: Duplicate queue names in fair scheduler allocation file Key: YARN-8688 URL: https://issues.apache.org/jira/browse/YARN-8688 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.1.0, 2.8.2 Reporter: Shen Yinjie when config++ duplicate queue names in fair scheduler allocation file, RM cannot recognized the error even if restart RM. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8602) make capacity-scheduler.xml file configurable in yarn-site?
Shen Yinjie created YARN-8602: - Summary: make capacity-scheduler.xml file configurable in yarn-site? Key: YARN-8602 URL: https://issues.apache.org/jira/browse/YARN-8602 Project: Hadoop YARN Issue Type: Improvement Reporter: Shen Yinjie Like Fair Scheduler? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8539) TimelineWebService#getUser from HttpServletRequest may be null
Shen Yinjie created YARN-8539: - Summary: TimelineWebService#getUser from HttpServletRequest may be null Key: YARN-8539 URL: https://issues.apache.org/jira/browse/YARN-8539 Project: Hadoop YARN Issue Type: Bug Components: timelineservice Reporter: Shen Yinjie When we integrate tez-ui with timeline server and set yarn.acl.enabled=true. tez-ui will invoke the timeline rest ** interface(ws/v1/timeline/TEZ_DAG_ID) to get all dags . But tez-ui shows "no records available" . after some digging, I find when tez-ui invoke ".../ws/v1/timeline/TEZ_DAG_ID". TimelineWebService#getUser(HttpServletRequest req) returns callerUgi = null In TimelineACLsManager#checkAccess() {code:java} .. if (callerUGI != null && (adminAclsManager.isAdmin(callerUGI) || callerUGI.getShortUserName().equals(owner) || domainACL.isUserAllowed(callerUGI))) { return true; } return false; } {code} Finally, Tez ui get nothing because of couldn't pass this checkAccess(). I also refer to the similar code in RMWebServices {code} protected Boolean hasAccess(RMApp app, HttpServletRequest hsr) { // Check for the authorization. UserGroupInformation callerUGI = getCallerUserGroupInformation(hsr, true); .. if (callerUGI != null && !(this.rm.getApplicationACLsManager().checkAccess(callerUGI, ApplicationAccessType.VIEW_APP, app.getUser(), app.getApplicationId()) || this.rm.getQueueACLsManager().checkAccess(callerUGI, QueueACL.ADMINISTER_QUEUE, app, hsr.getRemoteAddr(), forwardedAddresses))) { return false; } return true; } {code} when callerUgi= null, hasAcces() returns true. So , I made a similar fix for TimelineWebServices. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-8180) YARN Federation has not implemented blacklist sub-cluster for AM routing
Shen Yinjie created YARN-8180: - Summary: YARN Federation has not implemented blacklist sub-cluster for AM routing Key: YARN-8180 URL: https://issues.apache.org/jira/browse/YARN-8180 Project: Hadoop YARN Issue Type: Improvement Reporter: Shen Yinjie Property "yarn.federation.blacklist-subclusters" is defined in yarn-fedeartion doc,but it has not been implemented in code. In FederationClientInteerceptor#submitApplication() {code:java} List blacklist = new ArrayList(); for (int i = 0; i < numSubmitRetries; ++i) { SubClusterId subClusterId = policyFacade.getHomeSubcluster( request.getApplicationSubmissionContext(), blacklist); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7975) Add an optional arg to yarn cluster -list-node-labels to list all nodes collection partitioned by labels
Shen Yinjie created YARN-7975: - Summary: Add an optional arg to yarn cluster -list-node-labels to list all nodes collection partitioned by labels Key: YARN-7975 URL: https://issues.apache.org/jira/browse/YARN-7975 Project: Hadoop YARN Issue Type: Improvement Reporter: Shen Yinjie Since we have "yarn cluster -lnl" to print all nodelabels info .But it's not enough,we should be abale to list nodes collection partitioned by labels,especially in large cluster. So I propose to add an optional argument "-nodes" for "yarn cluster -lnl" to achieve this. e.g. [yarn@docker1 ~]$ yarn cluster -lnl -nodes Node Labels Num: 3 Labels Nodes
[jira] [Resolved] (YARN-7425) Failed to renew delegation token when RM user's TGT is expired
[ https://issues.apache.org/jira/browse/YARN-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie resolved YARN-7425. --- Resolution: Won't Fix > Failed to renew delegation token when RM user's TGT is expired > --- > > Key: YARN-7425 > URL: https://issues.apache.org/jira/browse/YARN-7425 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.2 >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Critical > Attachments: rm_log.png > > > we have a secure hadoop cluster with namenode federation. > submit job fails after kerberos TGT maxLifeTime expired(default 24h), client > log shows" failed to renew token: HDFS_DELEGATION_TOKEN...". > check rm log, found rm tgt is expired but not triggers relogin(),just retry > and fail... > (rm log see screenshot) > digging in code: > when rm tries to renewToken(), > UserGroupInformation.getLoginUser()="rm", > but UserGroupInformation.getCurrentUser()="testUser". > this causes Client.shouldAuthenticateOverKrb() returns false, thus cant > trigger reloginFromKeytab() or reloginFromTicketCache(). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-7425) Failed to renew delegation token when RM user's TGT is expired
Shen Yinjie created YARN-7425: - Summary: Failed to renew delegation token when RM user's TGT is expired Key: YARN-7425 URL: https://issues.apache.org/jira/browse/YARN-7425 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.2 Reporter: Shen Yinjie Priority: Critical we have a secure hadoop cluster with namenode federation. submit job fails after kerberos TGT maxLifeTime expired(default 24h), client log shows" failed to renew token: HDFS_DELEGATION_TOKEN...". check rm log, found rm tgt is expired but not triggers relogin(),just retry and fail... (some logs see screenshots) digging in code: when rm tries to renewToken(), UserGroupInformation.getLoginUser()="rm", but UserGroupInformation.getCurrentUser()="testUser". this causes Client.shouldAuthenticateOverKrb() returns false, thus cant trigger reloginFromKeytab() or reloginFromTicketCache(). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6462) Add yarn command to list all queues
Shen Yinjie created YARN-6462: - Summary: Add yarn command to list all queues Key: YARN-6462 URL: https://issues.apache.org/jira/browse/YARN-6462 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Shen Yinjie we need a yarn command to list all queues ,as already has this kind of command for applications and nodemangers... -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
[jira] [Created] (YARN-6006) Log aggregation causes nodemanager OOM
Shen Yinjie created YARN-6006: - Summary: Log aggregation causes nodemanager OOM Key: YARN-6006 URL: https://issues.apache.org/jira/browse/YARN-6006 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Shen Yinjie log aggregation is enabled, nodemanager died with oom exception. exception as sreenshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org