Re: Can't access remote Hive table from spark

Zhan Zhang Thu, 12 Feb 2015 00:21:03 -0800

When you log in, you have root access. Then you can do “su hdfs” or any other 
account. Then you can create hdfs directory and change permission, etc.



Thanks

Zhan Zhang

On Feb 11, 2015, at 11:28 PM, guxiaobo1982 
<guxiaobo1...@qq.com<mailto:guxiaobo1...@qq.com>> wrote:

Hi Zhan,

Yes, I found there is a hdfs account, which is created by Ambari, but what's 
the password for this account, how can I login under this account?
Can I just change the password for the hdfs account?

Regards,



------------------ Original ------------------
From:  "Zhan Zhang";<zzh...@hortonworks.com<mailto:zzh...@hortonworks.com>>;
Send time: Thursday, Feb 12, 2015 2:00 AM
To: ""<guxiaobo1...@qq.com<mailto:guxiaobo1...@qq.com>>;
Cc: 
"user@spark.apache.org<mailto:user@spark.apache.org>"<user@spark.apache.org<mailto:user@spark.apache.org>>;
 "Cheng Lian"<lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>>;
Subject:  Re: Can't access remote Hive table from spark

You need to have right hdfs account, e.g., hdfs,  to create directory and 
assign permission.

Thanks.

Zhan Zhang
On Feb 11, 2015, at 4:34 AM, guxiaobo1982 
<guxiaobo1...@qq.com<mailto:guxiaobo1...@qq.com>> wrote:

Hi Zhan,
My Single Node Cluster of Hadoop is installed by Ambari 1.7.0, I tried to 
create the /user/xiaobogu directory in hdfs, but both failed with user xiaobogu 
and root

[xiaobogu@lix1 current]$ hadoop dfs -mkdir /user/xiaobogu
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

mkdir: Permission denied: user=xiaobogu, access=WRITE, 
inode="/user":hdfs:hdfs:drwxr-xr-x

root@lix1 bin]# hadoop dfs -mkdir /user/xiaobogu
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


mkdir: Permission denied: user=root, access=WRITE, 
inode="/user":hdfs:hdfs:drwxr-xr-x

I notice there is a hdfs account created by ambari, but what's password for it, 
should I user the hdfs account to create the directory?



------------------ Original ------------------
From:  "Zhan Zhang";<zzh...@hortonworks.com<mailto:zzh...@hortonworks.com>>;
Send time: Sunday, Feb 8, 2015 4:11 AM
To: ""<guxiaobo1...@qq.com<mailto:guxiaobo1...@qq.com>>;
Cc: 
"user@spark.apache.org<mailto:user@spark.apache.org>"<user@spark.apache.org<mailto:user@spark.apache.org>>;
 "Cheng Lian"<lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>>;
Subject:  Re: Can't access remote Hive table from spark

Yes. You need to create xiaobogu under /user and provide right permission to 
xiaobogu.

Thanks.

Zhan Zhang

On Feb 7, 2015, at 8:15 AM, guxiaobo1982 
<guxiaobo1...@qq.com<mailto:guxiaobo1...@qq.com>> wrote:

Hi Zhan Zhang,

With the pre-bulit version 1.2.0 of spark against the yarn cluster installed by 
ambari 1.7.0, I come with the following errors:

[xiaobogu@lix1 spark]$ ./bin/spark-submit --class 
org.apache.spark.examples.SparkPi    --master yarn-cluster  --num-executors 3 
--driver-memory 512m  --executor-memory 512m   --executor-cores 1  
lib/spark-examples*.jar 10


Spark assembly has been built with Hive, including Datanucleus jars on classpath

15/02/08 00:11:53 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

15/02/08 00:11:54 INFO client.RMProxy: Connecting to ResourceManager at 
lix1.bh.com/192.168.100.3:8050<http://lix1.bh.com/192.168.100.3:8050>

15/02/08 00:11:56 INFO yarn.Client: Requesting a new application from cluster 
with 1 NodeManagers

15/02/08 00:11:57 INFO yarn.Client: Verifying our application has not requested 
more than the maximum memory capability of the cluster (4096 MB per container)

15/02/08 00:11:57 INFO yarn.Client: Will allocate AM container, with 896 MB 
memory including 384 MB overhead

15/02/08 00:11:57 INFO yarn.Client: Setting up container launch context for our 
AM

15/02/08 00:11:57 INFO yarn.Client: Preparing resources for our AM container

15/02/08 00:11:58 WARN hdfs.BlockReaderLocal: The short-circuit local reads 
feature cannot be used because libhadoop cannot be loaded.

Exception in thread "main" org.apache.hadoop.security.AccessControlException: 
Permission denied: user=xiaobogu, access=WRITE, 
inode="/user":hdfs:hdfs:drwxr-xr-x

at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)

at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)

at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)

at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6515)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6497)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6449)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4251)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)

at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)

at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)

at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)


at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)

at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)

at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2555)

at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2524)

at 
org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827)

at 
org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:823)

at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:823)

at 
org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:816)

at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815)

at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:595)

at 
org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:151)

at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:35)

at 
org.apache.spark.deploy.yarn.ClientBase$class.createContainerLaunchContext(ClientBase.scala:308)

at 
org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:35)

at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:80)

at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:501)

at org.apache.spark.deploy.yarn.Client.run(Client.scala:35)

at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139)

at org.apache.spark.deploy.yarn.Client.main(Client.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 Permission denied: user=xiaobogu, access=WRITE, 
inode="/user":hdfs:hdfs:drwxr-xr-x

at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)

at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)

at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)

at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:179)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6515)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6497)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6449)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4251)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)

at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)

at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)

at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)

at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)


at org.apache.hadoop.ipc.Client.call(Client.java:1410)

at org.apache.hadoop.ipc.Client.call(Client.java:1363)

at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)

at com.sun.proxy.$Proxy17.mkdirs(Unknown Source)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)

at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)

at com.sun.proxy.$Proxy17.mkdirs(Unknown Source)

at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:500)

at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2553)

... 24 more

[xiaobogu@lix1 spark]$



------------------ Original ------------------
From:  "Zhan Zhang";<zzh...@hortonworks.com<mailto:zzh...@hortonworks.com>>;
Send time: Friday, Feb 6, 2015 2:55 PM
To: ""<guxiaobo1...@qq.com<mailto:guxiaobo1...@qq.com>>;
Cc: 
"user@spark.apache.org<mailto:user@spark.apache.org>"<user@spark.apache.org<mailto:user@spark.apache.org>>;
 "Cheng Lian"<lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>>;
Subject:  Re: Can't access remote Hive table from spark

Not sure spark standalone mode. But on spark-on-yarn, it should work. You can 
check following link:

 http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/

Thanks.

Zhan Zhang

On Feb 5, 2015, at 5:02 PM, Cheng Lian 
<lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote:


Please note that Spark 1.2.0 only support Hive 0.13.1 or 0.12.0, none of other 
versions are supported.

Best,
Cheng

On 1/25/15 12:18 AM, guxiaobo1982 wrote:


Hi,
I built and started a single node standalone Spark 1.2.0 cluster along with a 
single node Hive 0.14.0 instance installed by Ambari 1.17.0. On the Spark and 
Hive node I can create and query tables inside Hive, and on remote machines I 
can submit the SparkPi example to the Spark master. But I failed to run the 
following example code :


public class SparkTest {

public static void main(String[] args)

{

String appName= "This is a test application";

String master="spark://lix1.bh.com:7077";


SparkConf conf = new SparkConf().setAppName(appName).setMaster(master);

JavaSparkContext sc = new JavaSparkContext(conf);


JavaHiveContext sqlCtx = new 
org.apache.spark.sql.hive.api.java.JavaHiveContext(sc);

//sqlCtx.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)");

//sqlCtx.sql("LOAD DATA LOCAL INPATH 
'/opt/spark/examples/src/main/resources/kv1.txt' INTO TABLE src");

// Queries are expressed in HiveQL.

List<Row> rows = sqlCtx.sql("FROM src SELECT key, value").collect();

System.out.print("I got " + rows.size() + " rows \r\n");

sc.close();}

}


Exception in thread "main" 
org.apache.hadoop.hive.ql.metadata.InvalidTableException: Table not found src

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:980)

at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:950)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:70)

at 
org.apache.spark.sql.hive.HiveContext$anon$2.org<http://2.org/>$apache$spark$sql$catalyst$analysis$OverrideCatalog$super$lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$anonfun$lookupRelation$3.apply(Catalog.scala:141)

at scala.Option.getOrElse(Option.scala:120)

at 
org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:141)

at 
org.apache.spark.sql.hive.HiveContext$anon$2.lookupRelation(HiveContext.scala:253)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:143)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$5.applyOrElse(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:162)

at scala.collection.Iterator$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147)

at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:138)

at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:137)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:61)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1$anonfun$apply$2.apply(RuleExecutor.scala:59)

at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)

at scala.collection.immutable.List.foldLeft(List.scala:84)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(RuleExecutor.scala:59)

at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$apply$1.apply(RuleExecutor.scala:51)

at scala.collection.immutable.List.foreach(List.scala:318)

at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)

at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)

at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)

at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)

at 
org.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)

at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)

at 
org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)

at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)

at 
org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)

at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)

at 
org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)

at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)

at org.apache.spark.sql.api.java.JavaSchemaRDD.collect(JavaSchemaRDD.scala:114)

at com.blackhorse.SparkTest.main(SparkTest.java:27)

[delete Spark temp dirs] DEBUG org.apache.spark.util.Utils - Shutdown hook 
called


[delete Spark local dirs] DEBUG org.apache.spark.storage.DiskBlockManager - 
Shutdown hook calle



But if I change the query to "show tables", the program can run and got 0 rows 
through I have many tables inside Hive, so I come to doubt that my program or 
the spark instance did not connect to my Hive instance, maybe it started a 
local hive. I have put the hive-site.xml file from Hive installation into 
spark's conf directory. Can you help figure out what's wrong here, thanks.

Re: Can't access remote Hive table from spark

Reply via email to