Possible. Let me try to recompile with export SPARK_HADOOP_VERSION=2.5.0-cdh5.3.3 export SPARK_YARN=true export SPARK_HIVE=true export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.5.0-cdh5.3.3 -Phive -Phive-thriftserver -DskipTests clean package
On Thu, Oct 29, 2015 at 11:05 AM, Kai Wei <kai.wei...@gmail.com> wrote: > Failed to see a hadoop-2.5 profile in pom. Maybe that's the problem. > On 30 Oct 2015 1:51 am, "Zoltan Fedor" <zoltan.0.fe...@gmail.com> wrote: > >> The funny thing is, that with Spark 1.2.0 on the same machine (Spark >> 1.2.0 is the default shipped with CDH 5.3.3) the same hive-site.xml is >> being picked up and I have no problem whatsoever. >> >> On Thu, Oct 29, 2015 at 10:48 AM, Zoltan Fedor <zoltan.0.fe...@gmail.com> >> wrote: >> >>> Yes, I have the hive-site.xml in $SPARK_HOME/conf, also in yarn-conf, in >>> /etc/hive/conf, etc >>> >>> On Thu, Oct 29, 2015 at 10:46 AM, Kai Wei <kai.wei...@gmail.com> wrote: >>> >>>> Did you try copy it to spark/conf dir? >>>> On 30 Oct 2015 1:42 am, "Zoltan Fedor" <zoltan.0.fe...@gmail.com> >>>> wrote: >>>> >>>>> There is /user/biapp in hdfs. The problem is that the hive-site.xml is >>>>> being ignored, so it is looking for it locally. >>>>> >>>>> On Thu, Oct 29, 2015 at 10:40 AM, Kai Wei <kai.wei...@gmail.com> >>>>> wrote: >>>>> >>>>>> Create /user/biapp in hdfs manually first. >>>>>> On 30 Oct 2015 1:36 am, "Zoltan Fedor" <zoltan.0.fe...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Sure, I did it with spark-shell, which seems to be showing the same >>>>>>> error - not using the hive-site.xml >>>>>>> >>>>>>> >>>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >>>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client --driver-class-path >>>>>>> $HIVE_CLASSPATH >>>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>>>>>> Type "help", "copyright", "credits" or "license" for more >>>>>>> information. >>>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>>> SLF4J: Found binding in >>>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>> SLF4J: Found binding in >>>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>>>>>> explanation. >>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>>>>>> 15/10/29 10:33:20 WARN MetricsSystem: Using default name >>>>>>> DAGScheduler for source because spark.app.id is not set. >>>>>>> 15/10/29 10:33:22 WARN NativeCodeLoader: Unable to load >>>>>>> native-hadoop library for your platform... using builtin-java classes >>>>>>> where >>>>>>> applicable >>>>>>> 15/10/29 10:33:50 WARN HiveConf: HiveConf of name >>>>>>> hive.metastore.local does not exist >>>>>>> Welcome to >>>>>>> ____ __ >>>>>>> / __/__ ___ _____/ /__ >>>>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>>>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>>>> /_/ >>>>>>> >>>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>>>>>> SparkContext available as sc, HiveContext available as sqlContext. >>>>>>> >>> >>>>>>> biapps@biapps-qa01:~> HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp MASTER=yarn >>>>>>> $SPARK_HOME/bin/spark-shell --deploy-mode client >>>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>>> SLF4J: Found binding in >>>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>> SLF4J: Found binding in >>>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>>>>>> explanation. >>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>>>>>> Welcome to >>>>>>> ____ __ >>>>>>> / __/__ ___ _____/ /__ >>>>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>>>> /___/ .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>>>> /_/ >>>>>>> >>>>>>> Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_91) >>>>>>> Type in expressions to have them evaluated. >>>>>>> Type :help for more information. >>>>>>> 15/10/29 10:34:15 WARN MetricsSystem: Using default name >>>>>>> DAGScheduler for source because spark.app.id is not set. >>>>>>> 15/10/29 10:34:16 WARN NativeCodeLoader: Unable to load >>>>>>> native-hadoop library for your platform... using builtin-java classes >>>>>>> where >>>>>>> applicable >>>>>>> Spark context available as sc. >>>>>>> 15/10/29 10:34:46 WARN HiveConf: HiveConf of name >>>>>>> hive.metastore.local does not exist >>>>>>> 15/10/29 10:34:46 WARN ShellBasedUnixGroupsMapping: got exception >>>>>>> trying to get groups for user biapp: id: biapp: No such user >>>>>>> >>>>>>> 15/10/29 10:34:46 WARN UserGroupInformation: No groups available for >>>>>>> user biapp >>>>>>> java.lang.RuntimeException: >>>>>>> org.apache.hadoop.security.AccessControlException: Permission denied: >>>>>>> user=biapp, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>>>>> at >>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) >>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>>> at >>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>>>>>> >>>>>>> at >>>>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) >>>>>>> at >>>>>>> org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:171) >>>>>>> at >>>>>>> org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:162) >>>>>>> at >>>>>>> org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:160) >>>>>>> at >>>>>>> org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:167) >>>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>>>>> Method) >>>>>>> at >>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>>>>>> at >>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>>>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028) >>>>>>> at $iwC$$iwC.<init>(<console>:9) >>>>>>> at $iwC.<init>(<console>:18) >>>>>>> at <init>(<console>:20) >>>>>>> at .<init>(<console>:24) >>>>>>> at .<clinit>(<console>) >>>>>>> at .<init>(<console>:7) >>>>>>> at .<clinit>(<console>) >>>>>>> at $print(<console>) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> at >>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>>> at >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >>>>>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >>>>>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >>>>>>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:132) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159) >>>>>>> at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:108) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>>>>>> at >>>>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>>>>>> at >>>>>>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>>>>>> at org.apache.spark.repl.SparkILoop.org >>>>>>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >>>>>>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >>>>>>> at org.apache.spark.repl.Main$.main(Main.scala:31) >>>>>>> at org.apache.spark.repl.Main.main(Main.scala) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> at >>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>>> at >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>>> at >>>>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) >>>>>>> at >>>>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) >>>>>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) >>>>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) >>>>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>>>>> Caused by: org.apache.hadoop.security.AccessControlException: >>>>>>> Permission denied: user=biapp, access=WRITE, >>>>>>> inode="/user":hdfs:supergroup:drwxr-xr-x >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>>>>> at >>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) >>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>>> at >>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>>>>>> >>>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>>>>> Method) >>>>>>> at >>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>>>>>> at >>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>>>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>>>>>> at >>>>>>> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) >>>>>>> at >>>>>>> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2689) >>>>>>> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2658) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:831) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827) >>>>>>> at >>>>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:827) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:820) >>>>>>> at >>>>>>> org.apache.hadoop.hive.ql.exec.Utilities.createDirsWithPermission(Utilities.java:3679) >>>>>>> at >>>>>>> org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:597) >>>>>>> at >>>>>>> org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554) >>>>>>> at >>>>>>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) >>>>>>> ... 56 more >>>>>>> Caused by: >>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): >>>>>>> Permission denied: user=biapp, access=WRITE, >>>>>>> inode="/user":hdfs:supergroup:drwxr-xr-x >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6287) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6269) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6221) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4088) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4058) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4031) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:788) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >>>>>>> at >>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587) >>>>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >>>>>>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>>> at >>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) >>>>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >>>>>>> >>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1411) >>>>>>> at org.apache.hadoop.ipc.Client.call(Client.java:1364) >>>>>>> at >>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) >>>>>>> at com.sun.proxy.$Proxy14.mkdirs(Unknown Source) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:531) >>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>> at >>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>>>>> at >>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>>>> at >>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) >>>>>>> at >>>>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) >>>>>>> at com.sun.proxy.$Proxy15.mkdirs(Unknown Source) >>>>>>> at >>>>>>> org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2687) >>>>>>> ... 66 more >>>>>>> >>>>>>> <console>:10: error: not found: value sqlContext >>>>>>> import sqlContext.implicits._ >>>>>>> ^ >>>>>>> <console>:10: error: not found: value sqlContext >>>>>>> import sqlContext.sql >>>>>>> ^ >>>>>>> >>>>>>> scala> sqlContext.sql("show databases").collect >>>>>>> <console>:14: error: not found: value sqlContext >>>>>>> sqlContext.sql("show databases").collect >>>>>>> ^ >>>>>>> >>>>>>> scala> >>>>>>> >>>>>>> On Thu, Oct 29, 2015 at 10:26 AM, Deenar Toraskar < >>>>>>> deenar.toras...@gmail.com> wrote: >>>>>>> >>>>>>>> I dont know a lot about how pyspark works. Can you possibly try >>>>>>>> running spark-shell and do the same? >>>>>>>> >>>>>>>> sqlContext.sql("show databases").collect >>>>>>>> >>>>>>>> Deenar >>>>>>>> >>>>>>>> On 29 October 2015 at 14:18, Zoltan Fedor <zoltan.0.fe...@gmail.com >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> Yes, I am. It was compiled with the following: >>>>>>>>> >>>>>>>>> export SPARK_HADOOP_VERSION=2.5.0-cdh5.3.3 >>>>>>>>> export SPARK_YARN=true >>>>>>>>> export SPARK_HIVE=true >>>>>>>>> export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M >>>>>>>>> -XX:ReservedCodeCacheSize=512m" >>>>>>>>> mvn -Pyarn -Phadoop-2.5 -Dhadoop.version=2.5.0-cdh5.3.3 -Phive >>>>>>>>> -Phive-thriftserver -DskipTests clean package >>>>>>>>> >>>>>>>>> On Thu, Oct 29, 2015 at 10:16 AM, Deenar Toraskar < >>>>>>>>> deenar.toras...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Are you using Spark built with hive ? >>>>>>>>>> >>>>>>>>>> # Apache Hadoop 2.4.X with Hive 13 support >>>>>>>>>> mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive >>>>>>>>>> -Phive-thriftserver -DskipTests clean package >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 29 October 2015 at 13:08, Zoltan Fedor < >>>>>>>>>> zoltan.0.fe...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Deenar, >>>>>>>>>>> As suggested, I have moved the hive-site.xml from >>>>>>>>>>> HADOOP_CONF_DIR ($SPARK_HOME/hadoop-conf) to YARN_CONF_DIR >>>>>>>>>>> ($SPARK_HOME/conf/yarn-conf) and use the below to start pyspark, >>>>>>>>>>> but the >>>>>>>>>>> error is the exact same as before. >>>>>>>>>>> >>>>>>>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>>>>>>>> YARN_CONF_DIR=$SPARK_HOME/conf/yarn-conf HADOOP_USER_NAME=biapp >>>>>>>>>>> MASTER=yarn >>>>>>>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client >>>>>>>>>>> >>>>>>>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>>>>>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>>>>>>>>>> Type "help", "copyright", "credits" or "license" for more >>>>>>>>>>> information. >>>>>>>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>>>>>>> SLF4J: Found binding in >>>>>>>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>>>>>> SLF4J: Found binding in >>>>>>>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings >>>>>>>>>>> for an explanation. >>>>>>>>>>> SLF4J: Actual binding is of type >>>>>>>>>>> [org.slf4j.impl.Log4jLoggerFactory] >>>>>>>>>>> 15/10/29 09:06:36 WARN MetricsSystem: Using default name >>>>>>>>>>> DAGScheduler for source because spark.app.id is not set. >>>>>>>>>>> 15/10/29 09:06:38 WARN NativeCodeLoader: Unable to load >>>>>>>>>>> native-hadoop library for your platform... using builtin-java >>>>>>>>>>> classes where >>>>>>>>>>> applicable >>>>>>>>>>> 15/10/29 09:07:03 WARN HiveConf: HiveConf of name >>>>>>>>>>> hive.metastore.local does not exist >>>>>>>>>>> Welcome to >>>>>>>>>>> ____ __ >>>>>>>>>>> / __/__ ___ _____/ /__ >>>>>>>>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>>>>>>>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>>>>>>>> /_/ >>>>>>>>>>> >>>>>>>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>>>>>>>>>> SparkContext available as sc, HiveContext available as >>>>>>>>>>> sqlContext. >>>>>>>>>>> >>> sqlContext2 = HiveContext(sc) >>>>>>>>>>> >>> sqlContext2 = HiveContext(sc) >>>>>>>>>>> >>> sqlContext2.sql("show databases").first() >>>>>>>>>>> 15/10/29 09:07:34 WARN HiveConf: HiveConf of name >>>>>>>>>>> hive.metastore.local does not exist >>>>>>>>>>> 15/10/29 09:07:35 WARN ShellBasedUnixGroupsMapping: got >>>>>>>>>>> exception trying to get groups for user biapp: id: biapp: No such >>>>>>>>>>> user >>>>>>>>>>> >>>>>>>>>>> 15/10/29 09:07:35 WARN UserGroupInformation: No groups available >>>>>>>>>>> for user biapp >>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>> File "<stdin>", line 1, in <module> >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>>>>>> line 552, in sql >>>>>>>>>>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >>>>>>>>>>> File >>>>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>>>>>> line 660, in _ssql_ctx >>>>>>>>>>> "build/sbt assembly", e) >>>>>>>>>>> Exception: ("You must build Spark with Hive. Export >>>>>>>>>>> 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An >>>>>>>>>>> error >>>>>>>>>>> occurred while calling >>>>>>>>>>> None.org.apache.spark.sql.hive.HiveContext.\n', >>>>>>>>>>> JavaObject id=o20)) >>>>>>>>>>> >>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Oct 29, 2015 at 7:20 AM, Deenar Toraskar < >>>>>>>>>>> deenar.toras...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> *Hi Zoltan* >>>>>>>>>>>> >>>>>>>>>>>> Add hive-site.xml to your YARN_CONF_DIR. i.e. >>>>>>>>>>>> $SPARK_HOME/conf/yarn-conf >>>>>>>>>>>> >>>>>>>>>>>> Deenar >>>>>>>>>>>> >>>>>>>>>>>> *Think Reactive Ltd* >>>>>>>>>>>> deenar.toras...@thinkreactive.co.uk >>>>>>>>>>>> 07714140812 >>>>>>>>>>>> >>>>>>>>>>>> On 28 October 2015 at 14:28, Zoltan Fedor < >>>>>>>>>>>> zoltan.0.fe...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> We have a shared CDH 5.3.3 cluster and trying to use Spark >>>>>>>>>>>>> 1.5.1 on it in yarn client mode with Hive. >>>>>>>>>>>>> >>>>>>>>>>>>> I have compiled Spark 1.5.1 with SPARK_HIVE=true, but it seems >>>>>>>>>>>>> I am not able to make SparkSQL to pick up the hive-site.xml when >>>>>>>>>>>>> runnig >>>>>>>>>>>>> pyspark. >>>>>>>>>>>>> >>>>>>>>>>>>> hive-site.xml is located in >>>>>>>>>>>>> $SPARK_HOME/hadoop-conf/hive-site.xml and also in >>>>>>>>>>>>> $SPARK_HOME/conf/hive-site.xml >>>>>>>>>>>>> >>>>>>>>>>>>> When I start pyspark with the below command and then run some >>>>>>>>>>>>> simple SparkSQL it fails, it seems it didn't pic up the settings >>>>>>>>>>>>> in >>>>>>>>>>>>> hive-site.xml >>>>>>>>>>>>> >>>>>>>>>>>>> $ HADOOP_CONF_DIR=$SPARK_HOME/hadoop-conf >>>>>>>>>>>>> YARN_CONF_DIR=$SPARK_HOME/yarn-conf HADOOP_USER_NAME=biapp >>>>>>>>>>>>> MASTER=yarn >>>>>>>>>>>>> $SPARK_HOME/bin/pyspark --deploy-mode client >>>>>>>>>>>>> >>>>>>>>>>>>> Python 2.6.6 (r266:84292, Jul 23 2015, 05:13:40) >>>>>>>>>>>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-16)] on linux2 >>>>>>>>>>>>> Type "help", "copyright", "credits" or "license" for more >>>>>>>>>>>>> information. >>>>>>>>>>>>> SLF4J: Class path contains multiple SLF4J bindings. >>>>>>>>>>>>> SLF4J: Found binding in >>>>>>>>>>>>> [jar:file:/usr/lib/spark-1.5.1-bin-without-hadoop/lib/spark-assembly-1.5.1-hadoop2.5.0-cdh5.3.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>>>>>>>> SLF4J: Found binding in >>>>>>>>>>>>> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>>>>>>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for >>>>>>>>>>>>> an explanation. >>>>>>>>>>>>> SLF4J: Actual binding is of type >>>>>>>>>>>>> [org.slf4j.impl.Log4jLoggerFactory] >>>>>>>>>>>>> 15/10/28 10:22:33 WARN MetricsSystem: Using default name >>>>>>>>>>>>> DAGScheduler for source because spark.app.id is not set. >>>>>>>>>>>>> 15/10/28 10:22:35 WARN NativeCodeLoader: Unable to load >>>>>>>>>>>>> native-hadoop library for your platform... using builtin-java >>>>>>>>>>>>> classes where >>>>>>>>>>>>> applicable >>>>>>>>>>>>> 15/10/28 10:22:59 WARN HiveConf: HiveConf of name >>>>>>>>>>>>> hive.metastore.local does not exist >>>>>>>>>>>>> Welcome to >>>>>>>>>>>>> ____ __ >>>>>>>>>>>>> / __/__ ___ _____/ /__ >>>>>>>>>>>>> _\ \/ _ \/ _ `/ __/ '_/ >>>>>>>>>>>>> /__ / .__/\_,_/_/ /_/\_\ version 1.5.1 >>>>>>>>>>>>> /_/ >>>>>>>>>>>>> >>>>>>>>>>>>> Using Python version 2.6.6 (r266:84292, Jul 23 2015 05:13:40) >>>>>>>>>>>>> SparkContext available as sc, HiveContext available as >>>>>>>>>>>>> sqlContext. >>>>>>>>>>>>> >>> sqlContext2 = HiveContext(sc) >>>>>>>>>>>>> >>> sqlContext2.sql("show databases").first() >>>>>>>>>>>>> 15/10/28 10:23:12 WARN HiveConf: HiveConf of name >>>>>>>>>>>>> hive.metastore.local does not exist >>>>>>>>>>>>> 15/10/28 10:23:13 WARN ShellBasedUnixGroupsMapping: got >>>>>>>>>>>>> exception trying to get groups for user biapp: id: biapp: No such >>>>>>>>>>>>> user >>>>>>>>>>>>> >>>>>>>>>>>>> 15/10/28 10:23:13 WARN UserGroupInformation: No groups >>>>>>>>>>>>> available for user biapp >>>>>>>>>>>>> Traceback (most recent call last): >>>>>>>>>>>>> File "<stdin>", line 1, in <module> >>>>>>>>>>>>> File >>>>>>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>>>>>>>> line 552, in sql >>>>>>>>>>>>> return DataFrame(self._ssql_ctx.sql(sqlQuery), self) >>>>>>>>>>>>> File >>>>>>>>>>>>> "/usr/lib/spark-1.5.1-bin-without-hadoop/python/pyspark/sql/context.py", >>>>>>>>>>>>> line 660, in _ssql_ctx >>>>>>>>>>>>> "build/sbt assembly", e) >>>>>>>>>>>>> Exception: ("You must build Spark with Hive. Export >>>>>>>>>>>>> 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An >>>>>>>>>>>>> error >>>>>>>>>>>>> occurred while calling >>>>>>>>>>>>> None.org.apache.spark.sql.hive.HiveContext.\n', >>>>>>>>>>>>> JavaObject id=o20)) >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> See in the above the warning about "WARN HiveConf: HiveConf of >>>>>>>>>>>>> name hive.metastore.local does not exist" while actually there is >>>>>>>>>>>>> a >>>>>>>>>>>>> hive.metastore.local attribute in the hive-site.xml >>>>>>>>>>>>> >>>>>>>>>>>>> Any idea how to submit hive-site.xml in yarn client mode? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> >>