For #1, the solution is to add the proxyuser groups and hosts as you have
done in #2. And I don't know of any other way to avoid proxy user
configuration from hadoop.

For #4 - Lens submits to HiveServer2 over thrift, not through jdbc.



On Tue, Aug 2, 2016 at 12:28 AM, Tao Yan <t...@linkedin.com> wrote:

> Hi Amareshwari,
>
> 1.  I tried restarting lensserver with --classpath /etc/hadoop/conf and
> in the mapred-site.xml removed the problematic property, then, it throws a
> different error:
>
>
>   INFO  org.apache.hadoop.hive.ql.exec.Utilities -
> BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=248
>   INFO  org.apache.hadoop.hive.ql.exec.Task - Number of reduce tasks not
> specified. Estimated from input data size: 1
>   INFO  org.apache.hadoop.hive.ql.exec.Task - In order to change the
> average load for a reducer (in bytes):
>   INFO  org.apache.hadoop.hive.ql.exec.Task -   set
> hive.exec.reducers.bytes.per.reducer=<number>
>   INFO  org.apache.hadoop.hive.ql.exec.Task - In order to limit the
> maximum number of reducers:
>   INFO  org.apache.hadoop.hive.ql.exec.Task -   set
> hive.exec.reducers.max=<number>
>   INFO  org.apache.hadoop.hive.ql.exec.Task - In order to set a constant
> number of reducers:
>   INFO  org.apache.hadoop.hive.ql.exec.Task -   set
> mapreduce.job.reduces=<number>
>   ERROR org.apache.hadoop.hive.ql.exec.mr.ExecDriver - Exception:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
> *User: test_user@GRID.****.COM is not allowed to impersonate test_user*
>   ERROR org.apache.hive.service.cli.operation.Operation - FAILED:
> Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>
> test_user is the user I used to do all the tests.
>
> 2. The error is caused by authorization failure of the Hadoop Httpfs
> service, so, I added the following configuration to the core-site.xml
> configuration in Hadoop configuration directory, which didn't work
>
> <configuration>
> <property>
> <name>hadoop.proxyuser.*test_user*.groups</name>
> <value>*</value>
> </property>
> <property>
> <name>hadoop.proxyuser.*test_user*.hosts</name>
> <value>*</value>
> </property>
> </configuration>
>
> I talked with Hadoop team and they told me the test_user is not a user
> group recognized by Httpfs service, in order to run my test, they have to
> create a new user group and deploy it in Hadoop cluster. So, they suggest
> me a work-around, which is to avoid using user proxy when submitting
> MapReduce job.
>
> 3. I tried to comment out the proxy user configuration and change
> HADOOP_HOME and HADOOP_CONF_DIR to point to my own Hadoop copy, and it
> didn't work.
>
> Is there a way to avoid using the proxy user and submit a mapreduce using
> my own user name?
>
> 4. When submitting a Lens query, I know the first step is to query Hive
> metastore, second step is to convert lens query to Hive query, third step
> is to hand over the converted hive query to the driver hive/hive1, what are
> the following steps? Does lens use a Hive JDBC driver to connect to
> HiveServer2, and then submit the query to HiveServer2?  I didn't find any
> related log in Hive log files.
>
> Thanks,
>
> On Thu, Jul 28, 2016 at 8:37 PM, amareshwarisr . <amareshw...@gmail.com>
> wrote:
>
>> Tao,
>>
>> Can you try restarting lensserver with --classpath /etc/hadoop/conf
>> (whatever is your hadoop conf directory) ?
>>
>> Thanks
>>
>> On Fri, Jul 29, 2016 at 3:04 AM, Tao Yan <t...@linkedin.com> wrote:
>>
>>> Hi Rajat & Amareshwari,
>>>
>>> I followed all the steps and it still give me the same error, and
>>> HADOOP_HOME also points to local file system, not HDFS. However, I found
>>> our Hadoop Cluster has the following configuration in mapred-site.xml:
>>>  <property>
>>>     <name>mapreduce.application.framework.path</name>
>>>     <value>hdfs:/*****/hadoop-2.6.1.*****.tar.gz</value>
>>>   </property>
>>>
>>> The default value is empty, and clients will try to find the mapreduce
>>> archive from local filesystem, and then try to submit the mapreduce job. I
>>> think this is the root cause of the exception since it is the exact same
>>> file name in the exception error message. Is there any approach to avoid
>>> this error on the lens or hive sides?
>>>
>>> BTW, when user submit a lens query, will it invoke the hive/hive1
>>> driver, and which in turns call hiveserver2? I saw it is calling
>>> 'org.apache.hadoop.hive.ql.Driver' in the server log, is it part of the
>>> hive JDBC driver or Thrift driver?
>>>
>>> Thanks,
>>>
>>> On Wed, Jul 27, 2016 at 11:12 AM, Tao Yan <t...@linkedin.com> wrote:
>>>
>>>> Hi Rajat & Amareshwari,
>>>>
>>>> Thanks very much for your suggestions. I will follow the steps.
>>>>
>>>> BTW, will you please send me the lens-site.xml and hivedriver-site.xml
>>>> files?
>>>>
>>>> It seems when I added the MySQL database configuration in
>>>> lens-site.xml, Lens can directly talk with MySQL, is that right?
>>>>
>>>> Thanks.
>>>>
>>>> On Wed, Jul 27, 2016 at 5:13 AM, amareshwarisr . <amareshw...@gmail.com
>>>> > wrote:
>>>>
>>>>> Tao,
>>>>>
>>>>> here are some points which can help you.
>>>>> - Default examples setup works only with single node setup. Not sure
>>>>> if you running in multinode cluster.
>>>>>
>>>>> - For missing htrace class, can you try adding htrace jar from hadoop
>>>>> libraries in LENSCPPATH
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Wed, Jul 27, 2016 at 9:49 AM, Tao Yan <t...@linkedin.com> wrote:
>>>>>
>>>>>> Hi Lens Developers,
>>>>>>
>>>>>> I have been doing Lens POC for my team for a while, and encountered
>>>>>> several issues which prevent me from querying the example data. A
>>>>>> brief summary of the issues:
>>>>>>
>>>>>> 1. Lens Query Failed to execute.
>>>>>> 2. Partition data is not correctly loaded in hive.
>>>>>>
>>>>>> I have read through all the documents in Lens website, and followed
>>>>>> step-by-step the 20 minutes demo. Following are the details for the first
>>>>>> issue:
>>>>>>
>>>>>> a. It failed to execute the Lens query, and returned the following
>>>>>> error:
>>>>>>
>>>>>> *lens-shell>query execute cube select product_id, store_sales from
>>>>>> sales where time_range_in(order_time, '2015-04-11-00', '2015-04-13-00')*
>>>>>> *26 Jul 2016 23:08:15 [Spring Shell] INFO  cliLogger - Query handle:
>>>>>> 69f44b77-97e9-4500-b385-a1881e5c365e*
>>>>>> *26 Jul 2016 23:08:22 [Spring Shell] INFO  cliLogger - User query:
>>>>>> 'cube select product_id, store_sales from sales where
>>>>>> time_range_in(order_time, '2015-04-11-00', '2015-04-13-00')' was 
>>>>>> submitted
>>>>>> to hive/hive1*
>>>>>> *26 Jul 2016 23:08:22 [Spring Shell] INFO  cliLogger -  Driver query:
>>>>>> 'INSERT OVERWRITE DIRECTORY
>>>>>> "file:/tmp/lensreports/hdfsout/69f44b77-97e9-4500-b385-a1881e5c365e"
>>>>>>  SELECT ( sales . product_id ), sum(( sales . store_sales )) FROM
>>>>>> newdb.local_sales_aggr_fact1 sales WHERE ((((( sales . ot ) =  
>>>>>> '2015-04-11'
>>>>>> ) or (( sales . ot ) =  '2015-04-12' )))) GROUP BY ( sales . product_id 
>>>>>> ) '
>>>>>> and Driver handle: OperationHandle [opType=EXECUTE_STATEMENT,
>>>>>> getHandleIdentifier()=8d1f3b8a-a660-472f-bfb7-32d6b73d0532]*
>>>>>> *Command failed java.lang.NullPointerException*
>>>>>>
>>>>>>
>>>>>> b. The server logs shows the following error:
>>>>>>
>>>>>> 26 Jul 2016 23:34:44 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
>>>>>> [pool-19-thread-7] INFO  org.apache.hadoop.hive.ql.exec.mr.ExecDriver -
>>>>>> Executing: /export/apps/hadoop/latest/bin/hadoop jar
>>>>>> /export/home/dev_svc/lens/apache-hive-0.13.1-inm-bin/lib/hive-common-0.13.1-inm.jar
>>>>>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver  -plan
>>>>>> file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10003/plan.xml
>>>>>>   -jobconffile
>>>>>> file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10002/jobconf.xml
>>>>>> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
>>>>>> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task -
>>>>>> Execution failed with exit status: 1
>>>>>> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
>>>>>> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task -
>>>>>> Obtaining error information
>>>>>> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
>>>>>> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task -
>>>>>> Task failed!
>>>>>> Task ID:
>>>>>>   Stage-1
>>>>>>
>>>>>> Logs:
>>>>>>
>>>>>>
>>>>>> c. The session logs shows the following error(*** is used to replace
>>>>>> internal cluster' hdfs path):
>>>>>>
>>>>>> 2016-07-20 17:45:04,103 ERROR [main]: mr.ExecDriver
>>>>>> (SessionState.java:printError(572)) - Job Submission failed with 
>>>>>> exception
>>>>>> 'java.lang.IllegalArgumentException(Wrong FS:
>>>>>> hdfs:/*****/hadoop-2.6.1.*****.tar.gz, expected: file:///)'
>>>>>>
>>>>>> java.lang.IllegalArgumentException: Wrong FS: 
>>>>>> hdfs:/*****/hadoop-2.6.1.****.tar.gz,
>>>>>> expected: file:///
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:455)
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
>>>>>>
>>>>>>        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>>>>>>
>>>>>>        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>>>>>>
>>>>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>>>>
>>>>>>        at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>>>>>>
>>>>>>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>>>>>>
>>>>>>        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>>>>>>
>>>>>>        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>>>>>>
>>>>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>>>>
>>>>>>        at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
>>>>>>
>>>>>>        at
>>>>>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:741)
>>>>>>
>>>>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>
>>>>>>        at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>
>>>>>>        at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>
>>>>>>        at java.lang.reflect.Method.invoke(Method.java:483)
>>>>>>
>>>>>>        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>>>>>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>>>>>
>>>>>>
>>>>>> d. Hive Cli shows the following result:
>>>>>>
>>>>>> hive (newdb)> describe formatted local_sales_aggr_fact1;
>>>>>> OK
>>>>>> # col_name             data_type           comment
>>>>>>
>>>>>> order_time           timestamp
>>>>>> delivery_time       timestamp
>>>>>> customer_id         int
>>>>>> product_id           int
>>>>>> promotion_id         int
>>>>>> customer_city_id     int
>>>>>> production_city_id   int
>>>>>> delivery_city_id     int
>>>>>> unit_sales           bigint
>>>>>> store_sales         double
>>>>>> store_cost           double
>>>>>> max_line_item_price float
>>>>>> max_line_item_discount float
>>>>>>
>>>>>> # Partition Information
>>>>>> # col_name             data_type           comment
>>>>>>
>>>>>> pt                   string               Process time partition
>>>>>> ot                   string               Order time partition
>>>>>> dt                   string               Delivery time partition
>>>>>>
>>>>>> # Detailed Table Information
>>>>>> Database:           newdb
>>>>>> Owner:               null
>>>>>> CreateTime:         Mon Jul 25 16:21:03 PDT 2016
>>>>>> LastAccessTime:     UNKNOWN
>>>>>> Protect Mode:       None
>>>>>> Retention:           0
>>>>>> Location:           file:/tmp/examples/aggrfact1
>>>>>> Table Type:         EXTERNAL_TABLE
>>>>>> Table Parameters:
>>>>>> EXTERNAL             TRUE
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.dt.first
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.dt.holes.size 0
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.dt.latest
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.dt.storage.class
>>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.ot.first
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.ot.holes.size 0
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.ot.latest
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.ot.storage.class
>>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.pt.first
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.pt.holes.size 0
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.pt.latest
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.DAILY.pt.storage.class
>>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.dt.first
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.dt.holes.size 0
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.dt.latest
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.dt.storage.class
>>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.ot.first
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.ot.holes.size 0
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.ot.latest
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.ot.storage.class
>>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.pt.first
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.pt.holes.size 0
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.pt.latest
>>>>>>
>>>>>> cube.storagetable.partition.timeline.cache.HOURLY.pt.storage.class
>>>>>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>>>>>> cube.storagetable.partition.timeline.cache.present true
>>>>>>
>>>>>> cube.storagetable.time.partcols pt,ot,dt
>>>>>> transient_lastDdlTime 1469574936
>>>>>> # Storage Information
>>>>>> SerDe Library:
>>>>>> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>>>>>> InputFormat:         org.apache.hadoop.mapred.TextInputFormat
>>>>>> OutputFormat:
>>>>>> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>>>>>> Compressed:         No
>>>>>> Num Buckets:         0
>>>>>> Bucket Columns:     []
>>>>>> Sort Columns:       []
>>>>>> Storage Desc Params:
>>>>>> field.delim         ,
>>>>>> serialization.format ,
>>>>>> Time taken: 6.435 seconds, Fetched: 73 row(s)
>>>>>>
>>>>>>
>>>>>> Because the query is finally against the hive table 
>>>>>> *local_sales_aggr_fact1,
>>>>>> which use the storage local:*
>>>>>>
>>>>>> <x_storage classname="org.apache.lens.cube.metadata.HDFSStorage"
>>>>>> name="local" xmlns="uri:lens:cube:0.1"
>>>>>>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>>>>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>>>>>>   <properties>
>>>>>>     *<property name="storage.url" value="file:///"/>*
>>>>>>   </properties>
>>>>>> </x_storage>
>>>>>>
>>>>>>
>>>>>> *e. So, I tried to change the value from "file:///" to
>>>>>> "hdfs:/hadoop-host:port/", (similar to
>>>>>> examples/resources/local-cluster-storage.xml), which didn't work.f. I 
>>>>>> also
>>>>>> tried to change the fact table 
>>>>>> "examples/resources/sales-aggr-fact1.xml"'s
>>>>>> location from "/tmp/examples/aggrfact1" to HDFS path, and then add
>>>>>> partition, it failed with the error:*
>>>>>> lens-shell>fact add partitions --fact_name sales_aggr_fact1
>>>>>> --storage_name local --path
>>>>>> examples/resources/sales-aggr-fact1-local-parts.xml
>>>>>> *Command failed javax.ws.rs.InternalServerErrorException: HTTP 500
>>>>>> Request failed.-- *
>>>>>>
>>>>>>
>>>>>> Corresponding Server logs:
>>>>>>
>>>>>> *Caused by: java.lang.NoClassDefFoundError: org/htrace/Trace*
>>>>>>         at
>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:214)
>>>>>> ~[hadoop-common-2.6.1.52.jar:na]
>>>>>>         at com.sun.proxy.$Proxy89.getFileInfo(Unknown Source) ~[na:na]
>>>>>>         at
>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>>>>>> ~[hadoop-hdfs-2.6.1.52.jar:na]
>>>>>>
>>>>>>
>>>>>> g. I also tried to change the hive table's location from
>>>>>> file:/tmp/examples/aggrfact1 to a HDFS location, and the Lens query
>>>>>> returned the same error.
>>>>>>
>>>>>>
>>>>>> For the second issue. I tried to query the hive table directory and
>>>>>> found the data is not loaded, because in the examples, partitions' 
>>>>>> location
>>>>>> is example/resource/..., which is not an absolute path from root '/', 
>>>>>> so, I
>>>>>> changed it to an absolute path, and the hive query 'Select * from
>>>>>> newdb.local_sales_aggr_fact1' is able to return the data, but the 
>>>>>> converted
>>>>>> lens query failed in the mapreduce stage because it cannot find the local
>>>>>> filesystem path I assigned.
>>>>>>
>>>>>> B*TW, I use the company's hadoop cluster, with hadoop version 2.6.1,
>>>>>> I setup a hive on top of that using the version hive-0.13.1-inm 
>>>>>> downloaded
>>>>>> from Lens website, and setup a mysql database as the metastore.*
>>>>>>
>>>>>> *Could you please help me resolve the above two issues?  I really
>>>>>> appreciate that.*
>>>>>>
>>>>>>
>>>>>> *Tao Yan*
>>>>>> Software Engineer
>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>
>>>>>>
>>>>>>
>>>>>> 206.250.5345
>>>>>> t...@linkedin.com
>>>>>> https://www.linkedin.com/in/taousc
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Tao Yan*
>>>> Software Engineer
>>>> Data Analytics Infrastructure Tools and Services
>>>>
>>>>
>>>>
>>>> 206.250.5345
>>>> t...@linkedin.com
>>>> https://www.linkedin.com/in/taousc
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> *Tao Yan*
>>> Software Engineer
>>> Data Analytics Infrastructure Tools and Services
>>>
>>>
>>>
>>> 206.250.5345
>>> t...@linkedin.com
>>> https://www.linkedin.com/in/taousc
>>>
>>
>>
>
>
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> t...@linkedin.com
> https://www.linkedin.com/in/taousc
>

Reply via email to