Failed to see a hadoop-2.5 profile in pom. Maybe that's the problem. On 30 Oct 2015 1:51 am, "Zoltan Fedor" <zoltan.0.fe...@gmail.com> wrote:
> The funny thing is, that with Spark 1.2.0 on the same machine (Spark 1.2.0 > is the default shipped with CDH 5.3.3) the same hive-site.xml is being > picked up and I have no problem whatsoever. > > On Thu, Oct 29, 2015 at 10:48 AM, Zoltan Fedor <zoltan.0.fe...@gmail.com> > wrote: > >> Yes, I have the hive-site.xml in $SPARK_HOME/conf, also in yarn-conf, in >> /etc/hive/conf, etc >> >> On Thu, Oct 29, 2015 at 10:46 AM, Kai Wei <kai.wei...@gmail.com> wrote: >> >>> Did you try copy it to spark/conf dir? >>> On 30 Oct 2015 1:42 am, "Zoltan Fedor" <zoltan.0.fe...@gmail.com> wrote: >>> >>>> There is /user/biapp in hdfs. The problem is that the hive-site.xml is >>>> being ignored, so it is looking for it locally. >>>> >>>> On Thu, Oct 29, 2015 at 10:40 AM, Kai Wei <kai.wei...@gmail.com> wrote: >>>> >>>>> Create /user/biapp in hdfs manually first. >>>>> On 30 Oct 2015 1:36 am, "Zoltan Fedor" <zoltan.0.fe...@gmail.com> >>>>> wrote: >>>>> >>>>>> Sure, I did it with spark-shell, which seems to be showing the same >>>>>> error - not using the hive-site.xml >>>>>> >>>>>> >>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client --driver-class-path >>>>>> $HIVE_CLASSPATH >>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>>>>> Type "help", "copyright", "credits" or "license" for more information. >>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>> SLF4J: Found binding in >>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>> SLF4J: Found binding in >>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>>>>> explanation. >>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>>>>> 15/10/29 10:33:20 WARN MetricsSystem: Using default name DAGScheduler >>>>>> for source because spark.app.id is not set. >>>>>> 15/10/29 10:33:22 WARN NativeCodeLoader: Unable to load native-hadoop >>>>>> library for your platform... using builtin-java classes where applicable >>>>>> 15/10/29 10:33:50 WARN HiveConf: HiveConf of name >>>>>> hive.metastore.local does not exist >>>>>> Welcome to >>>>>> ____ __ >>>>>> / __/__ ___ _____/ /__ >>>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>>> /_/ >>>>>> >>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>>>>> SparkContext available as sc, HiveContext available as sqlContext. >>>>>> >>> >>>>>> biapps@biapps-qa01:~> HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >>>>>> $SPARK_HOME/bin/spark-shell --deploy-mode client >>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>> SLF4J: Found binding in >>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>> SLF4J: Found binding in >>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>>>>> explanation. >>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>>>>> Welcome to >>>>>> ____ __ >>>>>> / __/__ ___ _____/ /__ >>>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>>> /___/ .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>>> /_/ >>>>>> >>>>>> Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_91) >>>>>> Type in expressions to have them evaluated. >>>>>> Type :help for more information. >>>>>> 15/10/29 10:34:15 WARN MetricsSystem: Using default name DAGScheduler >>>>>> for source because spark.app.id is not set. >>>>>> 15/10/29 10:34:16 WARN NativeCodeLoader: Unable to load native-hadoop >>>>>> library for your platform... using builtin-java classes where applicable >>>>>> Spark context available as sc. >>>>>> 15/10/29 10:34:46 WARN HiveConf: HiveConf of name >>>>>> hive.metastore.local does not exist >>>>>> 15/10/29 10:34:46 WARN ShellBasedUnixGroupsMapping: got exception >>>>>> trying to get groups for user biapp: id: biapp: No such user >>>>>> >>>>>> 15/10/29 10:34:46 WARN UserGroupInformation: No groups available for >>>>>> user biapp >>>>>> java.lang.RuntimeException: >>>>>> org.apache.hadoop.security.AccessControlException: Permission denied: >>>>>> user=biapp, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) >>>>>> at >>>>>> org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171) >>>>>> at >>>>>> org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) >>>>>> at >>>>>> org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) >>>>>> at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:167) >>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>>>> Method) >>>>>> at >>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>>>>> at >>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028) >>>>>> at $iwC$$iwC.<init>(<console>:9) >>>>>> at $iwC.<init>(<console>:18) >>>>>> at <init>(<console>:20) >>>>>> at .<init>(<console>:24) >>>>>> at .<clinit>(<console>) >>>>>> at .<init>(<console>:7) >>>>>> at .<clinit>(<console>) >>>>>> at $print(<console>) >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>> at >>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>> at >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>> at >>>>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >>>>>> at >>>>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) >>>>>> at >>>>>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >>>>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >>>>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >>>>>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:132) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124) >>>>>> at >>>>>> org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159) >>>>>> at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>>>>> at >>>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>>>>> at >>>>>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>>>>> at org.apache.spark.repl.SparkILoop.org >>>>>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >>>>>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >>>>>> at org.apache.spark.repl.Main$.main(Main.scala:31) >>>>>> at org.apache.spark.repl.Main.main(Main.scala) >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>> at >>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>> at >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>> at >>>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) >>>>>> at >>>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) >>>>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) >>>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) >>>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>>>> Caused by: org.apache.hadoop.security.AccessControlException: >>>>>> Permission denied: user=biapp, access=WRITE, >>>>>> inode="/user":hdfs:supergroup:drwxr-xr-x >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>>>>> >>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>>>> Method) >>>>>> at >>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>>>>> at >>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>>>>> at >>>>>> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) >>>>>> at >>>>>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) >>>>>> at >>>>>> org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2689) >>>>>> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2658) >>>>>> at >>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:831) >>>>>> at >>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827) >>>>>> at >>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>>>>> at >>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:827) >>>>>> at >>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:820) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.exec.Utilities.createDirsWithPermission(Utilities.java:3679) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:597) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) >>>>>> ... 56 more >>>>>> Caused by: >>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): >>>>>> Permission denied: user=biapp, access=WRITE, >>>>>> inode="/user":hdfs:supergroup:drwxr-xr-x >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788) >>>>>> at >>>>>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) >>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>>>>> >>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1411) >>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1364) >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) >>>>>> at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:531) >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>> at >>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>> at >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>> at >>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >>>>>> at >>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>>>>> at com.sun.proxy.$Proxy15.mkdirs(Unknown Source) >>>>>> at >>>>>> org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2687) >>>>>> ... 66 more >>>>>> >>>>>> <console>:10: error: not found: value sqlContext >>>>>> import sqlContext.implicits._ >>>>>> ^ >>>>>> <console>:10: error: not found: value sqlContext >>>>>> import sqlContext.sql >>>>>> ^ >>>>>> >>>>>> scala> sqlContext.sql("show databases").collect >>>>>> <console>:14: error: not found: value sqlContext >>>>>> sqlContext.sql("show databases").collect >>>>>> ^ >>>>>> >>>>>> scala> >>>>>> >>>>>> On Thu, Oct 29, 2015 at 10:26 AM, Deenar Toraskar < >>>>>> deenar.toras...@gmail.com> wrote: >>>>>> >>>>>>> I dont know a lot about how pyspark works. Can you possibly try >>>>>>> running spark-shell and do the same? >>>>>>> >>>>>>> sqlContext.sql("show databases").collect >>>>>>> >>>>>>> Deenar >>>>>>> >>>>>>> On 29 October 2015 at 14:18, Zoltan Fedor <zoltan.0.fe...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Yes, I am. It was compiled with the following: >>>>>>>> >>>>>>>> export SPARK_HADOOP_VERSION=2.5.0-cdh5.3.3 >>>>>>>> export SPARK_YARN=true >>>>>>>> export SPARK_HIVE=true >>>>>>>> export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M >>>>>>>> -XX:ReservedCodeCacheSize=512m" >>>>>>>> mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0-cdh5.3.3 -Phive >>>>>>>> -Phive-thriftserver -DskipTests clean package >>>>>>>> >>>>>>>> On Thu, Oct 29, 2015 at 10:16 AM, Deenar Toraskar < >>>>>>>> deenar.toras...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Are you using Spark built with hive ? >>>>>>>>> >>>>>>>>> # Apache Hadoop 2.4.X with Hive 13 support >>>>>>>>> mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive >>>>>>>>> -Phive-thriftserver -DskipTests clean package >>>>>>>>> >>>>>>>>> >>>>>>>>> On 29 October 2015 at 13:08, Zoltan Fedor < >>>>>>>>> zoltan.0.fe...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Deenar, >>>>>>>>>> As suggested, I have moved the hive-site.xml from HADOOP_CONF_DIR >>>>>>>>>> ($SPARK_HOME/hadoop-conf) to YARN_CONF_DIR >>>>>>>>>> ($SPARK_HOME/conf/yarn-conf) and >>>>>>>>>> use the below to start pyspark, but the error is the exact same as >>>>>>>>>> before. >>>>>>>>>> >>>>>>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>>>>>>> YARN_CONF_DIR=$SPARK_HOME/conf/yarn-conf HADOOP_USER_NAME=biapp >>>>>>>>>> MASTER=yarn >>>>>>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client >>>>>>>>>> >>>>>>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>>>>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>>>>>>>>> Type "help", "copyright", "credits" or "license" for more >>>>>>>>>> information. >>>>>>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>>>>>> SLF4J: Found binding in >>>>>>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>>>>> SLF4J: Found binding in >>>>>>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for >>>>>>>>>> an explanation. >>>>>>>>>> SLF4J: Actual binding is of type >>>>>>>>>> [org.slf4j.impl.Log4jLoggerFactory] >>>>>>>>>> 15/10/29 09:06:36 WARN MetricsSystem: Using default name >>>>>>>>>> DAGScheduler for source because spark.app.id is not set. >>>>>>>>>> 15/10/29 09:06:38 WARN NativeCodeLoader: Unable to load >>>>>>>>>> native-hadoop library for your platform... using builtin-java >>>>>>>>>> classes where >>>>>>>>>> applicable >>>>>>>>>> 15/10/29 09:07:03 WARN HiveConf: HiveConf of name >>>>>>>>>> hive.metastore.local does not exist >>>>>>>>>> Welcome to >>>>>>>>>> ____ __ >>>>>>>>>> / __/__ ___ _____/ /__ >>>>>>>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>>>>>>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>>>>>>> /_/ >>>>>>>>>> >>>>>>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>>>>>>>>> SparkContext available as sc, HiveContext available as sqlContext. >>>>>>>>>> >>> sqlContext2 = HiveContext(sc) >>>>>>>>>> >>> sqlContext2 = HiveContext(sc) >>>>>>>>>> >>> sqlContext2.sql("show databases").first() >>>>>>>>>> 15/10/29 09:07:34 WARN HiveConf: HiveConf of name >>>>>>>>>> hive.metastore.local does not exist >>>>>>>>>> 15/10/29 09:07:35 WARN ShellBasedUnixGroupsMapping: got exception >>>>>>>>>> trying to get groups for user biapp: id: biapp: No such user >>>>>>>>>> >>>>>>>>>> 15/10/29 09:07:35 WARN UserGroupInformation: No groups available >>>>>>>>>> for user biapp >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> File "<stdin>", line 1, in <module> >>>>>>>>>> File >>>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>>>>> line 552, in sql >>>>>>>>>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >>>>>>>>>> File >>>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>>>>> line 660, in _ssql_ctx >>>>>>>>>> "build/sbt assembly", e) >>>>>>>>>> Exception: ("You must build Spark with Hive. Export >>>>>>>>>> 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An >>>>>>>>>> error >>>>>>>>>> occurred while calling >>>>>>>>>> None.org.apache.spark.sql.hive.HiveContext.\n', >>>>>>>>>> JavaObject id=o20)) >>>>>>>>>> >>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Oct 29, 2015 at 7:20 AM, Deenar Toraskar < >>>>>>>>>> deenar.toras...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> *Hi Zoltan* >>>>>>>>>>> >>>>>>>>>>> Add hive-site.xml to your YARN_CONF_DIR. i.e. >>>>>>>>>>> $SPARK_HOME/conf/yarn-conf >>>>>>>>>>> >>>>>>>>>>> Deenar >>>>>>>>>>> >>>>>>>>>>> *Think Reactive Ltd* >>>>>>>>>>> deenar.toras...@thinkreactive.co.uk >>>>>>>>>>> 07714140812 >>>>>>>>>>> >>>>>>>>>>> On 28 October 2015 at 14:28, Zoltan Fedor < >>>>>>>>>>> zoltan.0.fe...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> We have a shared CDH 5.3.3 cluster and trying to use Spark >>>>>>>>>>>> 1.5.1 on it in yarn client mode with Hive. >>>>>>>>>>>> >>>>>>>>>>>> I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems >>>>>>>>>>>> I am not able to make SparkSQL to pick up the hive-site.xml when >>>>>>>>>>>> runnig >>>>>>>>>>>> pyspark. >>>>>>>>>>>> >>>>>>>>>>>> hive-site.xml is located in >>>>>>>>>>>> $SPARK_HOME/hadoop-conf/hive-site.xml and also in >>>>>>>>>>>> $SPARK_HOME/conf/hive-site.xml >>>>>>>>>>>> >>>>>>>>>>>> When I start pyspark with the below command and then run some >>>>>>>>>>>> simple SparkSQL it fails, it seems it didn't pic up the settings in >>>>>>>>>>>> hive-site.xml >>>>>>>>>>>> >>>>>>>>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>>>>>>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp >>>>>>>>>>>> MASTER=yarn >>>>>>>>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client >>>>>>>>>>>> >>>>>>>>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>>>>>>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>>>>>>>>>>> Type "help", "copyright", "credits" or "license" for more >>>>>>>>>>>> information. >>>>>>>>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>>>>>>>> SLF4J: Found binding in >>>>>>>>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>>>>>>> SLF4J: Found binding in >>>>>>>>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for >>>>>>>>>>>> an explanation. >>>>>>>>>>>> SLF4J: Actual binding is of type >>>>>>>>>>>> [org.slf4j.impl.Log4jLoggerFactory] >>>>>>>>>>>> 15/10/28 10:22:33 WARN MetricsSystem: Using default name >>>>>>>>>>>> DAGScheduler for source because spark.app.id is not set. >>>>>>>>>>>> 15/10/28 10:22:35 WARN NativeCodeLoader: Unable to load >>>>>>>>>>>> native-hadoop library for your platform... using builtin-java >>>>>>>>>>>> classes where >>>>>>>>>>>> applicable >>>>>>>>>>>> 15/10/28 10:22:59 WARN HiveConf: HiveConf of name >>>>>>>>>>>> hive.metastore.local does not exist >>>>>>>>>>>> Welcome to >>>>>>>>>>>> ____ __ >>>>>>>>>>>> / __/__ ___ _____/ /__ >>>>>>>>>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>>>>>>>>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>>>>>>>>> /_/ >>>>>>>>>>>> >>>>>>>>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>>>>>>>>>>> SparkContext available as sc, HiveContext available as >>>>>>>>>>>> sqlContext. >>>>>>>>>>>> >>> sqlContext2 = HiveContext(sc) >>>>>>>>>>>> >>> sqlContext2.sql("show databases").first() >>>>>>>>>>>> 15/10/28 10:23:12 WARN HiveConf: HiveConf of name >>>>>>>>>>>> hive.metastore.local does not exist >>>>>>>>>>>> 15/10/28 10:23:13 WARN ShellBasedUnixGroupsMapping: got >>>>>>>>>>>> exception trying to get groups for user biapp: id: biapp: No such >>>>>>>>>>>> user >>>>>>>>>>>> >>>>>>>>>>>> 15/10/28 10:23:13 WARN UserGroupInformation: No groups >>>>>>>>>>>> available for user biapp >>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>> File "<stdin>", line 1, in <module> >>>>>>>>>>>> File >>>>>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>>>>>>> line 552, in sql >>>>>>>>>>>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >>>>>>>>>>>> File >>>>>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>>>>>>> line 660, in _ssql_ctx >>>>>>>>>>>> "build/sbt assembly", e) >>>>>>>>>>>> Exception: ("You must build Spark with Hive. Export >>>>>>>>>>>> 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An >>>>>>>>>>>> error >>>>>>>>>>>> occurred while calling >>>>>>>>>>>> None.org.apache.spark.sql.hive.HiveContext.\n', >>>>>>>>>>>> JavaObject id=o20)) >>>>>>>>>>>> >>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> See in the above the warning about "WARN HiveConf: HiveConf of >>>>>>>>>>>> name hive.metastore.local does not exist" while actually there is a >>>>>>>>>>>> hive.metastore.local attribute in the hive-site.xml >>>>>>>>>>>> >>>>>>>>>>>> Any idea how to submit hive-site.xml in yarn client mode? >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >> >