Thanks Marcelo, Turns out I had missed setup steps in the actual file itself. Thanks to Richard for the help here. He pointed me to some java implementations.
I’m using the import org.apache.hadoop.security API. I now have: /* graphx_sp.scala */ import scala.util.Try import scala.io.Source import scala.util.parsing.json._ import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark._ import org.apache.spark.sql.functions._ import org.apache.spark.sql.Row import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD import org.apache.hadoop.security.UserGroupInformation object graphx_sp { def main(args: Array[String]){ // Settings val conf = new SparkConf().setAppName("graphx_sp") val sc = new SparkContext(conf) val sqlContext = new org.apache.spark.sql.SQLContext(sc) sc.setLogLevel("WARN") val principal = conf.get("spark.yarn.principal") val keytab = conf.get("spark.yarn.keytab") val loginUser = UserGroupInformation.loginUserFromKeytab(principal, keytab) UserGroupInformation.getLoginUser(loginUser) ## Actual code…. Running sbt returns: src/main/scala/graphx_sp.scala:35: too many arguments for method getLoginUser: ()org.apache.hadoop.security.UserGroupInformation [error] UserGroupInformation.getLoginUser(loginUser) [error] ^ [error] one error found [error] (compile:compileIncremental) Compilation failed The docs show that there should be two inputs, the principal and key tab. See here <https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/security/UserGroupInformation.html#getLoginUser()>. Can anyone point me to a tutorial or a run through of how to use Spark with Kerberos? This is proving to be quite confusing. Most search results on the topic point to what needs inputted at the point of `sparks submit` and not the changes needed in the actual src/main/.scala file Gerry > On 5 Dec 2016, at 19:45, Marcelo Vanzin <van...@cloudera.com> wrote: > > That's not the error, that's just telling you the application failed. > You have to look at the YARN logs for application_1479877553404_0041 > to see why it failed. > > On Mon, Dec 5, 2016 at 10:44 AM, Gerard Casey <gerardhughca...@gmail.com> > wrote: >> Thanks Marcelo, >> >> My understanding from a few pointers is that this may be due to insufficient >> read permissions to the key tab or a corrupt key tab. I have checked the >> read permissions and they are ok. I can see that it is initially configuring >> correctly: >> >> INFO security.UserGroupInformation: Login successful for user >> user@login_node using keytab file /path/to/keytab >> >> I’ve added the full trace below. >> >> Gerry >> >> Full trace: >> >> Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set >> Spark1 will be picked by default >> 16/12/05 18:23:27 WARN util.NativeCodeLoader: Unable to load native-hadoop >> library for your platform... using builtin-java classes where applicable >> 16/12/05 18:23:27 INFO security.UserGroupInformation: Login successful for >> user me@login_nodeusing keytab file /path/to/keytab >> 16/12/05 18:23:27 INFO yarn.Client: Attempting to login to the Kerberos >> using principal: me@login_node and keytab: /path/to/keytab >> 16/12/05 18:23:28 INFO impl.TimelineClientImpl: Timeline service address: >> http://login_node1.xcat.cluster:8188/ws/v1/timeline/ >> 16/12/05 18:23:28 INFO client.RMProxy: Connecting to ResourceManager at >> login_node1.xcat.cluster/ >> 16/12/05 18:23:28 INFO client.AHSProxy: Connecting to Application History >> server at login_node1.xcat.cluster/ >> 16/12/05 18:23:28 WARN shortcircuit.DomainSocketFactory: The short-circuit >> local reads feature cannot be used because libhadoop cannot be loaded. >> 16/12/05 18:23:28 INFO yarn.Client: Requesting a new application from >> cluster with 32 NodeManagers >> 16/12/05 18:23:28 INFO yarn.Client: Verifying our application has not >> requested more than the maximum memory capability of the cluster (15360 MB >> per container) >> 16/12/05 18:23:28 INFO yarn.Client: Will allocate AM container, with 1408 MB >> memory including 384 MB overhead >> 16/12/05 18:23:28 INFO yarn.Client: Setting up container launch context for >> our AM >> 16/12/05 18:23:28 INFO yarn.Client: Setting up the launch environment for >> our AM container >> 16/12/05 18:23:28 INFO yarn.Client: Using the spark assembly jar on HDFS >> because you are using HDP, >> defaultSparkAssembly:hdfs://login_node1.xcat.cluster:8020/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar >> 16/12/05 18:23:28 INFO yarn.Client: Credentials file set to: >> 16/12/05 18:23:28 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: >> hdfs://login_node1.xcat.cluster:8020/user/me/.sparkStaging/application_ >> 16/12/05 18:23:28 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token >> 1856 for me on >> 16/12/05 18:23:28 INFO yarn.Client: Renewal Interval set to 86400009 >> 16/12/05 18:23:28 INFO yarn.Client: Preparing resources for our AM container >> 16/12/05 18:23:28 INFO yarn.YarnSparkHadoopUtil: getting token for namenode: >> hdfs://login_node1.xcat.cluster:8020/user/me/.sparkStaging/application_ >> 16/12/05 18:23:28 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token >> 1857 for me on >> 16/12/05 18:23:29 INFO yarn.YarnSparkHadoopUtil: HBase class not found >> java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration >> 16/12/05 18:23:29 INFO yarn.Client: To enable the AM to login from keytab, >> credentials are being copied over to the AM via the YARN Secure Distributed >> Cache. >> 16/12/05 18:23:29 INFO yarn.Client: Uploading resource file:/path/to/keytab >> -> >> hdfs://login_node1.xcat.cluster:8020/user/me/.sparkStaging/application_1479877553404_0041/keytab >> 16/12/05 18:23:29 INFO yarn.Client: Using the spark assembly jar on HDFS >> because you are using HDP, >> defaultSparkAssembly:hdfs://login_node1.xcat.cluster:8020/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar >> 16/12/05 18:23:29 INFO yarn.Client: Source and destination file systems are >> the same. Not copying >> hdfs://login_node1.xcat.cluster:8020/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar >> 16/12/05 18:23:29 INFO yarn.Client: Uploading resource >> file:/home/me/Aoife/spark-abm/target/scala-2.10/graphx_sp_2.10-1.0.jar -> >> hdfs://login_node1.xcat.cluster:8020/user/me/.sparkStaging/application_1479877553404_0041/graphx_sp_2.10-1.0.jar >> 16/12/05 18:23:29 INFO yarn.Client: Uploading resource >> file:/tmp/spark-2e566133-d50a-4904-920e-ab5cec07c644/__spark_conf__6538744395325375994.zip >> -> >> hdfs://login_node1.xcat.cluster:8020/user/me/.sparkStaging/application_1479877553404_0041/__spark_conf__6538744395325375994.zip >> 16/12/05 18:23:29 INFO spark.SecurityManager: Changing view acls to: me >> 16/12/05 18:23:29 INFO spark.SecurityManager: Changing modify acls to: me >> 16/12/05 18:23:29 INFO spark.SecurityManager: SecurityManager: >> authentication disabled; ui acls disabled; users with view permissions: >> Set(me); users with modify permissions: Set(me) >> 16/12/05 18:23:29 INFO yarn.Client: Submitting application 41 to >> ResourceManager >> 16/12/05 18:23:30 INFO impl.YarnClientImpl: Submitted application >> application_1479877553404_0041 >> 16/12/05 18:23:31 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:31 INFO yarn.Client: >> client token: Token { kind: YARN_CLIENT_TOKEN, service: } >> diagnostics: AM container is launched, waiting for AM container to >> Register with RM >> ApplicationMaster host: N/A >> ApplicationMaster RPC port: -1 >> queue: default >> start time: 1480962209903 >> final status: UNDEFINED >> tracking URL: >> http://login_node1.xcat.cluster:8088/proxy/application_1479877553404_0041/ >> user: me >> 16/12/05 18:23:32 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:33 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:34 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:35 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:36 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:37 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:38 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:39 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:40 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:40 INFO yarn.Client: >> client token: Token { kind: YARN_CLIENT_TOKEN, service: } >> diagnostics: N/A >> ApplicationMaster host: >> ApplicationMaster RPC port: 0 >> queue: default >> start time: 1480962209903 >> final status: UNDEFINED >> tracking URL: >> http://login_node1.xcat.cluster:8088/proxy/application_1479877553404_0041/ >> user: me >> 16/12/05 18:23:41 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:42 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:43 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:44 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:45 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:46 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:47 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:48 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:49 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:50 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:51 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:52 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:53 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:54 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:23:55 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:55 INFO yarn.Client: >> client token: Token { kind: YARN_CLIENT_TOKEN, service: } >> diagnostics: AM container is launched, waiting for AM container to >> Register with RM >> ApplicationMaster host: N/A >> ApplicationMaster RPC port: -1 >> queue: default >> start time: 1480962209903 >> final status: UNDEFINED >> tracking URL: >> http://login_node1.xcat.cluster:8088/proxy/application_1479877553404_0041/ >> user: me >> 16/12/05 18:23:56 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:57 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:58 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:23:59 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:24:00 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:24:01 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:24:02 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:24:03 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:24:04 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: ACCEPTED) >> 16/12/05 18:24:05 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:05 INFO yarn.Client: >> client token: Token { kind: YARN_CLIENT_TOKEN, service: } >> diagnostics: N/A >> ApplicationMaster host: >> ApplicationMaster RPC port: 0 >> queue: default >> start time: 1480962209903 >> final status: UNDEFINED >> tracking URL: >> http://login_node1.xcat.cluster:8088/proxy/application_1479877553404_0041/ >> user: me >> 16/12/05 18:24:06 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:07 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:08 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:09 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:10 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:11 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:12 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:13 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:14 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:15 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:16 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:17 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: RUNNING) >> 16/12/05 18:24:18 INFO yarn.Client: Application report for >> application_1479877553404_0041 (state: FINISHED) >> 16/12/05 18:24:18 INFO yarn.Client: >> client token: Token { kind: YARN_CLIENT_TOKEN, service: } >> diagnostics: User class threw exception: >> org.apache.hadoop.security.AccessControlException: Authentication required >> ApplicationMaster host: >> ApplicationMaster RPC port: 0 >> queue: default >> start time: 1480962209903 >> final status: FAILED >> tracking URL: >> http://login_node1.xcat.cluster:8088/proxy/application_1479877553404_0041/ >> user: me >> Exception in thread "main" org.apache.spark.SparkException: Application >> application_1479877553404_0041 finished with failed status >> at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122) >> at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1169) >> at org.apache.spark.deploy.yarn.Client.main(Client.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:497) >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) >> at >> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) >> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> 16/12/05 18:24:18 INFO util.ShutdownHookManager: Shutdown hook called >> 16/12/05 18:24:18 INFO util.ShutdownHookManager: Deleting directory >> /tmp/spark-2e566133-d50a-4904-920e-ab5cec07c644 >> >> >> >> On Mon, Dec 5, 2016 at 10:30 AM, Gerard Casey <gerardhughca...@gmail.com> >> wrote: >> >>> >>>> On 5 Dec 2016, at 19:26, Marcelo Vanzin <van...@cloudera.com> wrote: >>>> >>>> There's generally an exception in these cases, and you haven't posted >>>> it, so it's hard to tell you what's wrong. The most probable cause, >>>> without the extra information the exception provides, is that you're >>>> using the wrong Hadoop configuration when submitting the job to YARN. >>>> >>>> On Mon, Dec 5, 2016 at 4:35 AM, Gerard Casey <gerardhughca...@gmail.com> >>>> wrote: >>>>> Hello all, >>>>> >>>>> I am using Spark with Kerberos authentication. >>>>> >>>>> I can run my code using `spark-shell` fine and I can also use >>>>> `spark-submit` >>>>> in local mode (e.g. —master local[16]). Both function as expected. >>>>> >>>>> local mode - >>>>> >>>>> spark-submit --class "graphx_sp" --master local[16] --driver-memory 20G >>>>> target/scala-2.10/graphx_sp_2.10-1.0.jar >>>>> >>>>> I am now progressing to run in cluster mode using YARN. >>>>> >>>>> cluster mode with YARN - >>>>> >>>>> spark-submit --class "graphx_sp" --master yarn --deploy-mode cluster >>>>> --executor-memory 13G --total-executor-cores 32 >>>>> target/scala-2.10/graphx_sp_2.10-1.0.jar >>>>> >>>>> However, this returns: >>>>> >>>>> diagnostics: User class threw exception: >>>>> org.apache.hadoop.security.AccessControlException: Authentication required >>>>> >>>>> Before I run using spark-shell or on local mode in spark-submit I do the >>>>> following kerberos setup: >>>>> >>>>> kinit -k -t ~/keytab -r 7d `whoami` >>>>> >>>>> Clearly, this setup is not extending to the YARN setup. How do I fix the >>>>> Kerberos issue with YARN in cluster mode? Is this something which must be >>>>> in >>>>> my /src/main/scala/graphx_sp.scala file? >>>>> >>>>> Many thanks >>>>> >>>>> Geroid >>>> >>>> >>>> >>>> -- >>>> Marcelo >>> >> >> >> >> -- >> Marcelo > > > > -- > Marcelo > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >