Karam Singh created YARN-2426: --------------------------------- Summary: NodeManger is not able use WebHDFS token properly to tallk to WebHDFS while localizing Key: YARN-2426 URL: https://issues.apache.org/jira/browse/YARN-2426 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager, webapp Affects Versions: 2.6.0 Environment: Hadoop Keberos (Secure) cluster with LinuxContainerExcutor is enabled With SPNEGO on for Yarn new RM web services for application submission While using kinit we are using -C (to specify cachepath). Then while executing set export KRB5CCNAME = <path provided with -C option>
There is no kerberos ticket in default KRB5 cache path with is /tmp Reporter: Karam Singh Encountered this issue during using new YARN's RM WS for application submission, on single node cluster while submitting Distributed Shell application using RM WS(webservice). For this we need pass custom script and AppMaster jar along with webhdfs token to NodeManager for localization. Distributed Shell Application was failing as Node was failing to localise AppMaster jar . Following is the NM log while localizing AppMaster jar: {code} 2014-08-18 01:53:52,434 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(114)) - Authorization successful for testing (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB 2014-08-18 01:53:52,757 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:update(1011)) - DEBUG: FAILED { webhdfs://<NAMENODEHOST>:<NAMENODEHTTPPORT>/user/<JARpPATH>, 1408352019488, FILE, null }, Authentication required 2014-08-18 01:53:52,758 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource webhdfs://<NAMENODEHOST>:<NAMENODEHTTPPORT>/user/<JARPATH>(-><NM_LOCAL_DIR>/usercache/<APP_USER>/appcache/application_1408351986532_0001/filecache/10/DshellAppMaster.jar) transitioned from DOWNLOADING to FAILED 2014-08-18 01:53:52,758 INFO container.Container (ContainerImpl.java:handle(999)) - Container container_1408351986532_0001_01_000001 transitioned from LOCALIZING to LOCALIZATION_FAILED {code} Which is similar to what we get is when we try access webhdfs in secure (kerberos) cluster without doing kinit Whereas if we do curl -i -k -s 'http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/<JAR_PATH>?op=listStatus&delegation=<same webhdfs token used in app submission structure>" works properly I also tried using http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/hadoopqa/<JAR_PATH> in app submission object instead of webhdfs:// uri format Then NodeManger fail to localize as there is http filesystem scheme {code} 14-08-18 02:03:31,343 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(114)) - Authorization successful for testing (auth:TOKEN) for protocol=interface org.apache. hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB 2014-08-18 02:03:31,583 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:update(1011)) - DEBUG: FAILED { http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/<JAR_PATH> 1408352576841, FILE, null }, No FileSystem for scheme: http 2014-08-18 02:03:31,583 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/<JAR_PATH>(-><NM_LOCAL_DIR>/usercache/<APP_USER>/appcache/application_1408352544163_0002/filecache/11/DshellAppMaster.jar) transitioned from DOWNLOADING to FAILED {code} Now do kinit without providing -C option for KRB5 cache path. So Ticket to goes to default KRB5 cache /tmp Again submit same application object to Yarn WS, with webhdfs:// uri format paths and webhdfs token This time NM is able download jar and custom shell script and application runs fine Looks like following is happening: webhdfs is trying look for krb ticket in NM while localising 1. As 1st case there was to krb ticket there in default cache. Application failing while localising AppMaster jar 2. In second case as already kinit and krb ticket was present in /tmp (default KRB5 cache). AppMaster got localized successfully -- This message was sent by Atlassian JIRA (v6.2#6252)