[ https://issues.apache.org/jira/browse/YARN-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Varun Vasudev reassigned YARN-2426: ----------------------------------- Assignee: Varun Vasudev > NodeManger is not able use WebHDFS token properly to tallk to WebHDFS while > localizing > --------------------------------------------------------------------------------------- > > Key: YARN-2426 > URL: https://issues.apache.org/jira/browse/YARN-2426 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager, webapp > Affects Versions: 2.6.0 > Environment: Hadoop Keberos (Secure) cluster with > LinuxContainerExcutor is enabled > With SPNEGO on for Yarn new RM web services for application submission > While using kinit we are using -C (to specify cachepath). > Then while executing set export KRB5CCNAME = <path provided with -C option> > There is no kerberos ticket in default KRB5 cache path with is /tmp > Reporter: Karam Singh > Assignee: Varun Vasudev > > Encountered this issue during using new YARN's RM WS for application > submission, on single node cluster while submitting Distributed Shell > application using RM WS(webservice). > For this we need pass custom script and AppMaster jar along with webhdfs > token to NodeManager for localization. > Distributed Shell Application was failing as Node was failing to localise > AppMaster jar . > Following is the NM log while localizing AppMaster jar: > {code} > 2014-08-18 01:53:52,434 INFO authorize.ServiceAuthorizationManager > (ServiceAuthorizationManager.java:authorize(114)) - Authorization successful > for testing (auth:TOKEN) for protocol=interface > org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > 2014-08-18 01:53:52,757 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:update(1011)) - DEBUG: FAILED { > webhdfs://<NAMENODEHOST>:<NAMENODEHTTPPORT>/user/<JARpPATH>, 1408352019488, > FILE, null }, Authentication required > 2014-08-18 01:53:52,758 INFO localizer.LocalizedResource > (LocalizedResource.java:handle(203)) - Resource > webhdfs://<NAMENODEHOST>:<NAMENODEHTTPPORT>/user/<JARPATH>(-><NM_LOCAL_DIR>/usercache/<APP_USER>/appcache/application_1408351986532_0001/filecache/10/DshellAppMaster.jar) > transitioned from DOWNLOADING to FAILED > 2014-08-18 01:53:52,758 INFO container.Container > (ContainerImpl.java:handle(999)) - Container > container_1408351986532_0001_01_000001 transitioned from LOCALIZING to > LOCALIZATION_FAILED > {code} > Which is similar to what we get is when we try access webhdfs in secure > (kerberos) cluster without doing kinit > Whereas if we do curl -i -k -s > 'http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/<JAR_PATH>?op=listStatus&delegation=<same > webhdfs token used in app submission structure>" > works properly > I also tried using > http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/hadoopqa/<JAR_PATH> > in app submission object instead of webhdfs:// uri format > Then NodeManger fail to localize as there is http filesystem scheme > {code} > 14-08-18 02:03:31,343 INFO authorize.ServiceAuthorizationManager > (ServiceAuthorizationManager.java:authorize(114)) - Authorization successful > for testing (auth:TOKEN) for protocol=interface org.apache. > hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > 2014-08-18 02:03:31,583 INFO localizer.ResourceLocalizationService > (ResourceLocalizationService.java:update(1011)) - DEBUG: FAILED { > http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/<JAR_PATH> > 1408352576841, FILE, null }, No FileSystem for scheme: http > 2014-08-18 02:03:31,583 INFO localizer.LocalizedResource > (LocalizedResource.java:handle(203)) - Resource > http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/<JAR_PATH>(-><NM_LOCAL_DIR>/usercache/<APP_USER>/appcache/application_1408352544163_0002/filecache/11/DshellAppMaster.jar) > transitioned from DOWNLOADING to FAILED > {code} > Now do kinit without providing -C option for KRB5 cache path. So Ticket to > goes to default KRB5 cache /tmp > Again submit same application object to Yarn WS, with webhdfs:// uri format > paths and webhdfs token > This time NM is able download jar and custom shell script and application > runs fine > Looks like following is happening: > webhdfs is trying look for krb ticket in NM while localising > 1. As 1st case there was to krb ticket there in default cache. Application > failing while localising AppMaster jar > 2. In second case as already kinit and krb ticket was present in /tmp > (default KRB5 cache). AppMaster got localized successfully -- This message was sent by Atlassian JIRA (v6.2#6252)