I have kerberos enabled in my cluster.

In case I create external table using beeline I see from hdfs namenode log that it does Kerberos auth for every single file I guess.

It may be the reason why creating external hive table fails in case I have loads of directories and files under them.

Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780

On 12/05/16 10:41, Margus Roo wrote:

No I got closer and discovered that my problem is related with permissions.

In example

drwxr-xr-x   - margusja  hdfs          0 2016-05-12 03:33 /tmp/files_10k

...

-rw-r--r-- 3 margusja hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1959.txt -rw-r--r-- 3 margusja hdfs 4 2016-05-12 02:01 /tmp/files_10k/f196.txt -rw-r--r-- 3 margusja hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1960.txt -rw-r--r-- 3 margusja hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1961.txt -rw-r--r-- 3 margusja hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1962.txt -rw-r--r-- 3 margusja hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1963.txt -rw-r--r-- 3 margusja hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1964.txt -rw-r--r-- 3 margusja hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1965.txt -rw-r--r-- 3 margusja hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1966.txt

...

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://bigdata29.webmedia.int:10000/> create external table files_10k (i int) row format delimited fields terminated by '\t' location '/tmp/files_10k';
No rows affected (3.184 seconds)
0: jdbc:hive2://bigdata29.webmedia.int:10000/>


Now I change owner to flume in example.

drwxr-xr-x   - flume     hdfs          0 2016-05-12 03:33 /tmp/files_10k

...

-rw-r--r-- 3 flume hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1968.txt -rw-r--r-- 3 flume hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1969.txt -rw-r--r-- 3 flume hdfs 4 2016-05-12 02:01 /tmp/files_10k/f197.txt -rw-r--r-- 3 flume hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1970.txt -rw-r--r-- 3 flume hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1971.txt -rw-r--r-- 3 flume hdfs 5 2016-05-12 02:01 /tmp/files_10k/f1972.txt

...

Others can read. In example user margusja can read

[margusja@bigdata29 ~]$ hdfs dfs -ls /tmp/files_10k
Found 1112 items
-rw-r--r-- 3 flume hdfs 2 2016-05-12 01:59 /tmp/files_10k/f1.txt -rw-r--r-- 3 flume hdfs 3 2016-05-12 01:59 /tmp/files_10k/f10.txt -rw-r--r-- 3 flume hdfs 4 2016-05-12 01:59 /tmp/files_10k/f100.txt -rw-r--r-- 3 flume hdfs 5 2016-05-12 01:59 /tmp/files_10k/f1000.txt -rw-r--r-- 3 flume hdfs 6 2016-05-12 01:59 /tmp/files_10k/f10000.txt

I try now create a table

0: jdbc:hive2://bigdata29.webmedia.int:10000/> create external table files_10k (i int) row format delimited fields terminated by '\t' location '/tmp/files_10k'; Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [margusja] does not have [READ] privilege on [hdfs://mycluster/tmp/files_10k] (state=42000,code=40000)
0: jdbc:hive2://bigdata29.webmedia.int:10000/>

In Hiveserver2.log:

2016-05-12 03:38:58,111 INFO [HiveServer2-Handler-Pool: Thread-69]: parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: create external table files_10k (i int) row format delimited fields terminated by '\t' location '/tmp/files_10k' 2016-05-12 03:38:58,112 INFO [HiveServer2-Handler-Pool: Thread-69]: parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed 2016-05-12 03:38:58,112 INFO [HiveServer2-Handler-Pool: Thread-69]: log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - </PERFLOG method=parse start=1463038738111 end=1463038738112 duration=1 from=org.apache.hadoop.hive.ql.Driver> 2016-05-12 03:38:58,112 INFO [HiveServer2-Handler-Pool: Thread-69]: log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> 2016-05-12 03:38:58,112 INFO [HiveServer2-Handler-Pool: Thread-69]: parse.CalcitePlanner (SemanticAnalyzer.java:analyzeInternal(10114)) - Starting Semantic Analysis 2016-05-12 03:38:58,113 INFO [HiveServer2-Handler-Pool: Thread-69]: parse.CalcitePlanner (SemanticAnalyzer.java:analyzeCreateTable(10776)) - Creating table default.files_10k position=22 2016-05-12 03:38:58,113 INFO [HiveServer2-Handler-Pool: Thread-69]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 1: get_database: default 2016-05-12 03:38:58,113 INFO [HiveServer2-Handler-Pool: Thread-69]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=hive/bigdata29.webmedia....@testhadoop.com ip=unknown-ip-addr cmd=get_database: default 2016-05-12 03:38:58,118 INFO [HiveServer2-Handler-Pool: Thread-69]: ql.Driver (Driver.java:compile(466)) - Semantic Analysis Completed 2016-05-12 03:38:58,118 INFO [HiveServer2-Handler-Pool: Thread-69]: log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - </PERFLOG method=semanticAnalyze start=1463038738112 end=1463038738118 duration=6 from=org.apache.hadoop.hive.ql.Driver> 2016-05-12 03:38:58,118 INFO [HiveServer2-Handler-Pool: Thread-69]: ql.Driver (Driver.java:getSchema(246)) - Returning Hive schema: Schema(fieldSchemas:null, properties:null) 2016-05-12 03:38:58,118 INFO [HiveServer2-Handler-Pool: Thread-69]: log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - <PERFLOG method=doAuthorization from=org.apache.hadoop.hive.ql.Driver> 2016-05-12 03:39:00,148 INFO [org.apache.hadoop.util.JvmPauseMonitor$Monitor@53bb71e5]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1916ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=2002ms
2016-05-12 03:39:01,733 INFO [org.apache.hadoop.util.JvmPauseMonitor$Monitor@53bb71e5]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1081ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=1455ms
2016-05-12 03:39:20,984 ERROR [HiveServer2-Handler-Pool: Thread-69]: authorizer.RangerHiveAuthorizer (RangerHiveAuthorizer.java:isURIAccessAllowed(755)) - Error getting permissions for hdfs://mycluster/tmp/files_10k java.io.IOException: Couldn't create proxy provider class org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider


I am confused. What extra rights Hive except?


Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780
On 11/05/16 14:17, Margus Roo wrote:

One more example:

[hdfs@hadoopnn1 ~]$ hdfs dfs -count -h /user/margusja/files_10k/
           1        9.8 K             47.7 K /user/margusja/files_10k
[hdfs@hadoopnn1 ~]$ hdfs dfs -count -h /datasource/dealgate/
          53        7.9 K              8.5 G /datasource/dealgate

2: jdbc:hive2://hadoopnn1.estpak.ee:10000/def> create external table files_10k (i int) row format delimited fields terminated by '\t' location '/user/margusja/files_10k';
No rows affected (0.197 seconds)
2: jdbc:hive2://hadoopnn1.estpak.ee:10000/def> drop table files_10k;
No rows affected (0.078 seconds)
2: jdbc:hive2://hadoopnn1.estpak.ee:10000/def> create external table files_10k (i int) row format delimited fields terminated by '\t' location '/datasource/dealgate'; Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)
2: jdbc:hive2://hadoopnn1.estpak.ee:10000/def>


So in my point of view beeline in some reason looks data and old hive client does not.

Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780
On 11/05/16 13:35, Margus Roo wrote:

More information:

2016-05-11 13:31:17,086 INFO [HiveServer2-Handler-Pool: Thread-5867]: parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: create external table files_10k (i int) row format delimited fields terminated by '\t' location '/user/margusja/files_10k' 2016-05-11 13:31:17,089 INFO [HiveServer2-Handler-Pool: Thread-5867]: parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed 2016-05-11 13:31:17,089 INFO [HiveServer2-Handler-Pool: Thread-5867]: log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - </PERFLOG method=parse start=1462962677086 end=1462962677089 duration=3 from=org.apache.hadoop.hive.ql.Driver> 2016-05-11 13:31:17,089 INFO [HiveServer2-Handler-Pool: Thread-5867]: log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver> 2016-05-11 13:31:17,090 INFO [HiveServer2-Handler-Pool: Thread-5867]: parse.CalcitePlanner (SemanticAnalyzer.java:analyzeInternal(10114)) - Starting Semantic Analysis 2016-05-11 13:31:17,093 INFO [HiveServer2-Handler-Pool: Thread-5867]: parse.CalcitePlanner (SemanticAnalyzer.java:analyzeCreateTable(10776)) - Creating table default.files_10k position=22 2016-05-11 13:31:17,094 INFO [HiveServer2-Handler-Pool: Thread-5867]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 2: get_database: default 2016-05-11 13:31:17,094 INFO [HiveServer2-Handler-Pool: Thread-5867]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=hive/hadoopnn1.estpak...@testhadoop.com ip=unknown-ip-addr cmd=get_database: default 2016-05-11 13:31:17,098 WARN [HiveServer2-Handler-Pool: Thread-5867]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user hive 2016-05-11 13:31:17,098 WARN [HiveServer2-Handler-Pool: Thread-5867]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user hive 2016-05-11 13:31:17,099 WARN [HiveServer2-Handler-Pool: Thread-5867]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user hive 2016-05-11 13:31:17,099 WARN [HiveServer2-Handler-Pool: Thread-5867]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user hive 2016-05-11 13:31:17,099 INFO [HiveServer2-Handler-Pool: Thread-5867]: metadata.HiveUtils (HiveUtils.java:getMetaStoreAuthorizeProviderManagers(353)) - Adding metastore authorization provider: org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider 2016-05-11 13:31:17,102 WARN [HiveServer2-Handler-Pool: Thread-5867]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user hive 2016-05-11 13:31:17,102 WARN [HiveServer2-Handler-Pool: Thread-5867]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user hive 2016-05-11 13:31:17,106 INFO [HiveServer2-Handler-Pool: Thread-5867]: ql.Driver (Driver.java:compile(466)) - Semantic Analysis Completed 2016-05-11 13:31:17,106 INFO [HiveServer2-Handler-Pool: Thread-5867]: log.PerfLogger (PerfLogger.java:PerfLogEnd(162)) - </PERFLOG method=semanticAnalyze start=1462962677089 end=1462962677106 duration=17 from=org.apache.hadoop.hive.ql.Driver> 2016-05-11 13:31:17,106 INFO [HiveServer2-Handler-Pool: Thread-5867]: ql.Driver (Driver.java:getSchema(246)) - Returning Hive schema: Schema(fieldSchemas:null, properties:null) 2016-05-11 13:31:17,106 INFO [HiveServer2-Handler-Pool: Thread-5867]: log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - <PERFLOG method=doAuthorization from=org.apache.hadoop.hive.ql.Driver> 2016-05-11 13:31:17,107 WARN [HiveServer2-Handler-Pool: Thread-5867]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user margusja 2016-05-11 13:31:18,289 INFO [org.apache.hadoop.util.JvmPauseMonitor$Monitor@59f45950]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1092ms 2016-05-11 13:31:29,547 INFO [HiveServer2-Handler-Pool: Thread-5867]: retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(144)) - Exception while invoking getListing of class ClientNamenodeProtocolTranslatorPB over hadoopnn1.estpak.ee/88.196.164.42:8020. Trying to fail over immediately. java.io.IOException: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:580)
        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy16.getListing(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2094) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2077) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:832) at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:863) at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:859) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:859) at org.apache.hadoop.hive.common.FileUtils.isOwnerOfFileHierarchy(FileUtils.java:481) at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.isURIAccessAllowed(RangerHiveAuthorizer.java:749) at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:252) at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.isURIAccessAllowed(RangerHiveAuthorizer.java:749) at org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.checkPrivileges(RangerHiveAuthorizer.java:252) at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:817) at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:608)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:499)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:314)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1164) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1158) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:110) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:181) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:410) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:397) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:274) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.protobuf.ServiceException: java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:271)
        at com.sun.proxy.$Proxy15.getListing(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:573)
        ... 39 more

I have hdfs namenode high availability configured and automatic fail over enabled. I can see that active namenode does not change during the creating table process.

Also I have hive high availability configured.








Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780
On 11/05/16 12:26, Margus Roo wrote:

Sadly in our environment:


Generated files like you did.

Connected to: Apache Hive (version 1.2.1.2.3.4.0-3485)
Driver: Hive JDBC (version 1.2.1.2.3.4.0-3485)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoopnn1.estpak.ee:2181,hado> create external table files_10k (i int) row format delimited fields terminated by '\t' location '/user/margusja/files_10k'; Error: Shutdown in progress, cannot remove a shutdownHook (state=,code=0)
0: jdbc:hive2://hadoopnn1.estpak.ee:2181,hado>

Using just hive:

[margusja@hadoopnn1 ~]$ hive
WARNING: Use "yarn jar" to launch YARN applications.
log4j:WARN No such property [maxBackupIndex] in org.apache.log4j.DailyRollingFileAppender.

Logging initialized using configuration in file:/etc/hive/2.3.4.0-3485/0/hive-log4j.properties hive> create external table files_10k (i int) row format delimited fields terminated by '\t' location '/user/margusja/files_10k';
OK
Time taken: 1.255 seconds
hive>


Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780
On 11/05/16 10:16, Markovitz, Dudu wrote:
create external table files_10k (i int) row format delimited fields terminated by '\t' location '/tmp/files_10k';





Reply via email to