For #1, the solution is to add the proxyuser groups and hosts as you have done in #2. And I don't know of any other way to avoid proxy user configuration from hadoop.
For #4 - Lens submits to HiveServer2 over thrift, not through jdbc. On Tue, Aug 2, 2016 at 12:28 AM, Tao Yan <t...@linkedin.com> wrote: > Hi Amareshwari, > > 1. I tried restarting lensserver with --classpath /etc/hadoop/conf and > in the mapred-site.xml removed the problematic property, then, it throws a > different error: > > > INFO org.apache.hadoop.hive.ql.exec.Utilities - > BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=248 > INFO org.apache.hadoop.hive.ql.exec.Task - Number of reduce tasks not > specified. Estimated from input data size: 1 > INFO org.apache.hadoop.hive.ql.exec.Task - In order to change the > average load for a reducer (in bytes): > INFO org.apache.hadoop.hive.ql.exec.Task - set > hive.exec.reducers.bytes.per.reducer=<number> > INFO org.apache.hadoop.hive.ql.exec.Task - In order to limit the > maximum number of reducers: > INFO org.apache.hadoop.hive.ql.exec.Task - set > hive.exec.reducers.max=<number> > INFO org.apache.hadoop.hive.ql.exec.Task - In order to set a constant > number of reducers: > INFO org.apache.hadoop.hive.ql.exec.Task - set > mapreduce.job.reduces=<number> > ERROR org.apache.hadoop.hive.ql.exec.mr.ExecDriver - Exception: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): > *User: test_user@GRID.****.COM is not allowed to impersonate test_user* > ERROR org.apache.hive.service.cli.operation.Operation - FAILED: > Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.mr.MapRedTask > > test_user is the user I used to do all the tests. > > 2. The error is caused by authorization failure of the Hadoop Httpfs > service, so, I added the following configuration to the core-site.xml > configuration in Hadoop configuration directory, which didn't work > > <configuration> > <property> > <name>hadoop.proxyuser.*test_user*.groups</name> > <value>*</value> > </property> > <property> > <name>hadoop.proxyuser.*test_user*.hosts</name> > <value>*</value> > </property> > </configuration> > > I talked with Hadoop team and they told me the test_user is not a user > group recognized by Httpfs service, in order to run my test, they have to > create a new user group and deploy it in Hadoop cluster. So, they suggest > me a work-around, which is to avoid using user proxy when submitting > MapReduce job. > > 3. I tried to comment out the proxy user configuration and change > HADOOP_HOME and HADOOP_CONF_DIR to point to my own Hadoop copy, and it > didn't work. > > Is there a way to avoid using the proxy user and submit a mapreduce using > my own user name? > > 4. When submitting a Lens query, I know the first step is to query Hive > metastore, second step is to convert lens query to Hive query, third step > is to hand over the converted hive query to the driver hive/hive1, what are > the following steps? Does lens use a Hive JDBC driver to connect to > HiveServer2, and then submit the query to HiveServer2? I didn't find any > related log in Hive log files. > > Thanks, > > On Thu, Jul 28, 2016 at 8:37 PM, amareshwarisr . <amareshw...@gmail.com> > wrote: > >> Tao, >> >> Can you try restarting lensserver with --classpath /etc/hadoop/conf >> (whatever is your hadoop conf directory) ? >> >> Thanks >> >> On Fri, Jul 29, 2016 at 3:04 AM, Tao Yan <t...@linkedin.com> wrote: >> >>> Hi Rajat & Amareshwari, >>> >>> I followed all the steps and it still give me the same error, and >>> HADOOP_HOME also points to local file system, not HDFS. However, I found >>> our Hadoop Cluster has the following configuration in mapred-site.xml: >>> <property> >>> <name>mapreduce.application.framework.path</name> >>> <value>hdfs:/*****/hadoop-2.6.1.*****.tar.gz</value> >>> </property> >>> >>> The default value is empty, and clients will try to find the mapreduce >>> archive from local filesystem, and then try to submit the mapreduce job. I >>> think this is the root cause of the exception since it is the exact same >>> file name in the exception error message. Is there any approach to avoid >>> this error on the lens or hive sides? >>> >>> BTW, when user submit a lens query, will it invoke the hive/hive1 >>> driver, and which in turns call hiveserver2? I saw it is calling >>> 'org.apache.hadoop.hive.ql.Driver' in the server log, is it part of the >>> hive JDBC driver or Thrift driver? >>> >>> Thanks, >>> >>> On Wed, Jul 27, 2016 at 11:12 AM, Tao Yan <t...@linkedin.com> wrote: >>> >>>> Hi Rajat & Amareshwari, >>>> >>>> Thanks very much for your suggestions. I will follow the steps. >>>> >>>> BTW, will you please send me the lens-site.xml and hivedriver-site.xml >>>> files? >>>> >>>> It seems when I added the MySQL database configuration in >>>> lens-site.xml, Lens can directly talk with MySQL, is that right? >>>> >>>> Thanks. >>>> >>>> On Wed, Jul 27, 2016 at 5:13 AM, amareshwarisr . <amareshw...@gmail.com >>>> > wrote: >>>> >>>>> Tao, >>>>> >>>>> here are some points which can help you. >>>>> - Default examples setup works only with single node setup. Not sure >>>>> if you running in multinode cluster. >>>>> >>>>> - For missing htrace class, can you try adding htrace jar from hadoop >>>>> libraries in LENSCPPATH >>>>> >>>>> Thanks >>>>> >>>>> On Wed, Jul 27, 2016 at 9:49 AM, Tao Yan <t...@linkedin.com> wrote: >>>>> >>>>>> Hi Lens Developers, >>>>>> >>>>>> I have been doing Lens POC for my team for a while, and encountered >>>>>> several issues which prevent me from querying the example data. A >>>>>> brief summary of the issues: >>>>>> >>>>>> 1. Lens Query Failed to execute. >>>>>> 2. Partition data is not correctly loaded in hive. >>>>>> >>>>>> I have read through all the documents in Lens website, and followed >>>>>> step-by-step the 20 minutes demo. Following are the details for the first >>>>>> issue: >>>>>> >>>>>> a. It failed to execute the Lens query, and returned the following >>>>>> error: >>>>>> >>>>>> *lens-shell>query execute cube select product_id, store_sales from >>>>>> sales where time_range_in(order_time, '2015-04-11-00', '2015-04-13-00')* >>>>>> *26 Jul 2016 23:08:15 [Spring Shell] INFO cliLogger - Query handle: >>>>>> 69f44b77-97e9-4500-b385-a1881e5c365e* >>>>>> *26 Jul 2016 23:08:22 [Spring Shell] INFO cliLogger - User query: >>>>>> 'cube select product_id, store_sales from sales where >>>>>> time_range_in(order_time, '2015-04-11-00', '2015-04-13-00')' was >>>>>> submitted >>>>>> to hive/hive1* >>>>>> *26 Jul 2016 23:08:22 [Spring Shell] INFO cliLogger - Driver query: >>>>>> 'INSERT OVERWRITE DIRECTORY >>>>>> "file:/tmp/lensreports/hdfsout/69f44b77-97e9-4500-b385-a1881e5c365e" >>>>>> SELECT ( sales . product_id ), sum(( sales . store_sales )) FROM >>>>>> newdb.local_sales_aggr_fact1 sales WHERE ((((( sales . ot ) = >>>>>> '2015-04-11' >>>>>> ) or (( sales . ot ) = '2015-04-12' )))) GROUP BY ( sales . product_id >>>>>> ) ' >>>>>> and Driver handle: OperationHandle [opType=EXECUTE_STATEMENT, >>>>>> getHandleIdentifier()=8d1f3b8a-a660-472f-bfb7-32d6b73d0532]* >>>>>> *Command failed java.lang.NullPointerException* >>>>>> >>>>>> >>>>>> b. The server logs shows the following error: >>>>>> >>>>>> 26 Jul 2016 23:34:44 [f3f5c221-e099-4f8e-9c32-42733365d3b6] >>>>>> [pool-19-thread-7] INFO org.apache.hadoop.hive.ql.exec.mr.ExecDriver - >>>>>> Executing: /export/apps/hadoop/latest/bin/hadoop jar >>>>>> /export/home/dev_svc/lens/apache-hive-0.13.1-inm-bin/lib/hive-common-0.13.1-inm.jar >>>>>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver -plan >>>>>> file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10003/plan.xml >>>>>> -jobconffile >>>>>> file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10002/jobconf.xml >>>>>> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6] >>>>>> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - >>>>>> Execution failed with exit status: 1 >>>>>> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6] >>>>>> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - >>>>>> Obtaining error information >>>>>> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6] >>>>>> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - >>>>>> Task failed! >>>>>> Task ID: >>>>>> Stage-1 >>>>>> >>>>>> Logs: >>>>>> >>>>>> >>>>>> c. The session logs shows the following error(*** is used to replace >>>>>> internal cluster' hdfs path): >>>>>> >>>>>> 2016-07-20 17:45:04,103 ERROR [main]: mr.ExecDriver >>>>>> (SessionState.java:printError(572)) - Job Submission failed with >>>>>> exception >>>>>> 'java.lang.IllegalArgumentException(Wrong FS: >>>>>> hdfs:/*****/hadoop-2.6.1.*****.tar.gz, expected: file:///)' >>>>>> >>>>>> java.lang.IllegalArgumentException: Wrong FS: >>>>>> hdfs:/*****/hadoop-2.6.1.****.tar.gz, >>>>>> expected: file:/// >>>>>> >>>>>> at >>>>>> org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:455) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143) >>>>>> >>>>>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) >>>>>> >>>>>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) >>>>>> >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) >>>>>> >>>>>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) >>>>>> >>>>>> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) >>>>>> >>>>>> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) >>>>>> >>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>> >>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420) >>>>>> >>>>>> at >>>>>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:741) >>>>>> >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>> >>>>>> at >>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>> >>>>>> at >>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>> >>>>>> at java.lang.reflect.Method.invoke(Method.java:483) >>>>>> >>>>>> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) >>>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) >>>>>> >>>>>> >>>>>> d. Hive Cli shows the following result: >>>>>> >>>>>> hive (newdb)> describe formatted local_sales_aggr_fact1; >>>>>> OK >>>>>> # col_name data_type comment >>>>>> >>>>>> order_time timestamp >>>>>> delivery_time timestamp >>>>>> customer_id int >>>>>> product_id int >>>>>> promotion_id int >>>>>> customer_city_id int >>>>>> production_city_id int >>>>>> delivery_city_id int >>>>>> unit_sales bigint >>>>>> store_sales double >>>>>> store_cost double >>>>>> max_line_item_price float >>>>>> max_line_item_discount float >>>>>> >>>>>> # Partition Information >>>>>> # col_name data_type comment >>>>>> >>>>>> pt string Process time partition >>>>>> ot string Order time partition >>>>>> dt string Delivery time partition >>>>>> >>>>>> # Detailed Table Information >>>>>> Database: newdb >>>>>> Owner: null >>>>>> CreateTime: Mon Jul 25 16:21:03 PDT 2016 >>>>>> LastAccessTime: UNKNOWN >>>>>> Protect Mode: None >>>>>> Retention: 0 >>>>>> Location: file:/tmp/examples/aggrfact1 >>>>>> Table Type: EXTERNAL_TABLE >>>>>> Table Parameters: >>>>>> EXTERNAL TRUE >>>>>> cube.storagetable.partition.timeline.cache.DAILY.dt.first >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.DAILY.dt.holes.size 0 >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.DAILY.dt.latest >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.DAILY.dt.storage.class >>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >>>>>> cube.storagetable.partition.timeline.cache.DAILY.ot.first >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.DAILY.ot.holes.size 0 >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.DAILY.ot.latest >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.DAILY.ot.storage.class >>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >>>>>> cube.storagetable.partition.timeline.cache.DAILY.pt.first >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.DAILY.pt.holes.size 0 >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.DAILY.pt.latest >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.DAILY.pt.storage.class >>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.dt.first >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.dt.holes.size 0 >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.dt.latest >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.dt.storage.class >>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.ot.first >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.ot.holes.size 0 >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.ot.latest >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.ot.storage.class >>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.pt.first >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.pt.holes.size 0 >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.pt.latest >>>>>> >>>>>> cube.storagetable.partition.timeline.cache.HOURLY.pt.storage.class >>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >>>>>> cube.storagetable.partition.timeline.cache.present true >>>>>> >>>>>> cube.storagetable.time.partcols pt,ot,dt >>>>>> transient_lastDdlTime 1469574936 >>>>>> # Storage Information >>>>>> SerDe Library: >>>>>> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe >>>>>> InputFormat: org.apache.hadoop.mapred.TextInputFormat >>>>>> OutputFormat: >>>>>> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat >>>>>> Compressed: No >>>>>> Num Buckets: 0 >>>>>> Bucket Columns: [] >>>>>> Sort Columns: [] >>>>>> Storage Desc Params: >>>>>> field.delim , >>>>>> serialization.format , >>>>>> Time taken: 6.435 seconds, Fetched: 73 row(s) >>>>>> >>>>>> >>>>>> Because the query is finally against the hive table >>>>>> *local_sales_aggr_fact1, >>>>>> which use the storage local:* >>>>>> >>>>>> <x_storage classname="org.apache.lens.cube.metadata.HDFSStorage" >>>>>> name="local" xmlns="uri:lens:cube:0.1" >>>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>>>>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd "> >>>>>> <properties> >>>>>> *<property name="storage.url" value="file:///"/>* >>>>>> </properties> >>>>>> </x_storage> >>>>>> >>>>>> >>>>>> *e. So, I tried to change the value from "file:///" to >>>>>> "hdfs:/hadoop-host:port/", (similar to >>>>>> examples/resources/local-cluster-storage.xml), which didn't work.f. I >>>>>> also >>>>>> tried to change the fact table >>>>>> "examples/resources/sales-aggr-fact1.xml"'s >>>>>> location from "/tmp/examples/aggrfact1" to HDFS path, and then add >>>>>> partition, it failed with the error:* >>>>>> lens-shell>fact add partitions --fact_name sales_aggr_fact1 >>>>>> --storage_name local --path >>>>>> examples/resources/sales-aggr-fact1-local-parts.xml >>>>>> *Command failed javax.ws.rs.InternalServerErrorException: HTTP 500 >>>>>> Request failed.-- * >>>>>> >>>>>> >>>>>> Corresponding Server logs: >>>>>> >>>>>> *Caused by: java.lang.NoClassDefFoundError: org/htrace/Trace* >>>>>> at >>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:214) >>>>>> ~[hadoop-common-2.6.1.52.jar:na] >>>>>> at com.sun.proxy.$Proxy89.getFileInfo(Unknown Source) ~[na:na] >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752) >>>>>> ~[hadoop-hdfs-2.6.1.52.jar:na] >>>>>> >>>>>> >>>>>> g. I also tried to change the hive table's location from >>>>>> file:/tmp/examples/aggrfact1 to a HDFS location, and the Lens query >>>>>> returned the same error. >>>>>> >>>>>> >>>>>> For the second issue. I tried to query the hive table directory and >>>>>> found the data is not loaded, because in the examples, partitions' >>>>>> location >>>>>> is example/resource/..., which is not an absolute path from root '/', >>>>>> so, I >>>>>> changed it to an absolute path, and the hive query 'Select * from >>>>>> newdb.local_sales_aggr_fact1' is able to return the data, but the >>>>>> converted >>>>>> lens query failed in the mapreduce stage because it cannot find the local >>>>>> filesystem path I assigned. >>>>>> >>>>>> B*TW, I use the company's hadoop cluster, with hadoop version 2.6.1, >>>>>> I setup a hive on top of that using the version hive-0.13.1-inm >>>>>> downloaded >>>>>> from Lens website, and setup a mysql database as the metastore.* >>>>>> >>>>>> *Could you please help me resolve the above two issues? I really >>>>>> appreciate that.* >>>>>> >>>>>> >>>>>> *Tao Yan* >>>>>> Software Engineer >>>>>> Data Analytics Infrastructure Tools and Services >>>>>> >>>>>> >>>>>> >>>>>> 206.250.5345 >>>>>> t...@linkedin.com >>>>>> https://www.linkedin.com/in/taousc >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> *Tao Yan* >>>> Software Engineer >>>> Data Analytics Infrastructure Tools and Services >>>> >>>> >>>> >>>> 206.250.5345 >>>> t...@linkedin.com >>>> https://www.linkedin.com/in/taousc >>>> >>> >>> >>> >>> -- >>> >>> *Tao Yan* >>> Software Engineer >>> Data Analytics Infrastructure Tools and Services >>> >>> >>> >>> 206.250.5345 >>> t...@linkedin.com >>> https://www.linkedin.com/in/taousc >>> >> >> > > > -- > > *Tao Yan* > Software Engineer > Data Analytics Infrastructure Tools and Services > > > > 206.250.5345 > t...@linkedin.com > https://www.linkedin.com/in/taousc >