how to get counts as a byproduct of a query

2015-12-01 Thread Frank Luo
Very often I need to run a query against a table(s), then collect some counts. I am wondering if there is a way to kill two birds by scanning the table once. (I don’t mind to save the counts as a separate file or something like that) For example, I got a table A and B. I need to do an inner join

Re: Caching intermediate data in tez object registry

2015-12-01 Thread Bing Jiang
hi, Raajay. https://issues.apache.org/jira/browse/HIVE-7313 provides a potential solutions to store intermediate data into Memory/SSD. But it relies on the hdfs feature of multiple StorageType ( https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html ) 2015-12-0

Caching intermediate data in tez object registry

2015-12-01 Thread Raajay
Hello, My setup is Hive on Tez. I find that for most of my queries, the map stage takes the longest. Is it possible to use the Tez Shared Object Registry to cache the intermediate data to improve performance of recurring queries ? If yes, how would I do it ? Assuming that the nodes I run on have

RE: Using spark in tandem with Hive

2015-12-01 Thread Mich Talebzadeh
Thanks. My test bed has the following components. 1.Spark version 1.5.2 2.Hive version 1.2.1 3.Hadoop version 2.6 I will try your suggestions, however, we have to consider that the underlying table is based on a Hive table, to keep the systematics the same so to speak

Re: Using spark in tandem with Hive

2015-12-01 Thread Jörn Franke
You Should use TEZ (preferably >0.8 and a release of hive supporting it, because it has tez service which allows more lower latency queries) instead of mr to get the first query faster. The second query is probably faster in hive because you use statistics, which to my knowledge are not leverage

RE: Using spark in tandem with Hive

2015-12-01 Thread Mich Talebzadeh
The table was created in spark-sql as ORC table use asehadoop; drop table if exists tt; create table tt ( owner varchar(30) ,object_name varchar(30) ,subobject_name varchar(30) ,object_id bigint ,data_object_id bigint ,object

Re: Using spark in tandem with Hive

2015-12-01 Thread Jörn Franke
How did you create the tables? Do you have automated statistics activated in Hive? Btw mr is outdated as a Hive execution engine. Use TEZ (maybe wait for 0.8 for sub second queries ) or use Spark as an execution engine in Hive. > On 01 Dec 2015, at 17:40, Mich Talebzadeh wrote: > > What if we

Using spark in tandem with Hive

2015-12-01 Thread Mich Talebzadeh
What if we decide to use spark with Hive. I look to hear similar views My test bed comprised 1.Spark version 1.5.2 2.Hive version 1.2.1 3.Hadoop version 2.6 I made Spark to use Hive metastore. So using spark-sql I can pretty do whatever one can do with HiveQL I cre

RE: UPDATE RE: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes (beeline - hive server 2)

2015-12-01 Thread Timothy Garza
The following JIRA refers: https://issues.apache.org/jira/browse/HIVE-12553 From: Timothy Garza [mailto:timothy.ga...@collinsongroup.com] Sent: 01 December 2015 12:44 To: user@hive.apache.org Subject: RE: UPDATE RE: com.mysql.jdbc.exceptions.jd

Re: Problem with getting start of Hive on Spark

2015-12-01 Thread Xuefu Zhang
Link, It seems that you're using Hive 1.2.1, which doesn't support Spark 1.5.2, or at least not tested. Please try Hive master branch if you want to use Spark 1.5.2. If the problem remains, please provide all the commands you run in your Hive session that leads to the failure. Thanks, Xuefu On M

Re: Problem with getting start of Hive on Spark

2015-12-01 Thread Xuefu Zhang
Mich, As I understand, you have a problem with Hive on Spark due to duel network interfaces. I agree that this is something that should be fixed in Hive. However, saying Hive on Spark doesn't work seems unfair. At Cloudera, we have many customers that successfully deployed Hive on Spark on their c

RE: UPDATE RE: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes (beeline - hive server 2)

2015-12-01 Thread Timothy Garza
Now leaning towards this being a bug in Hive v1.2.1. for the MySQL metastore classes…Let me show you why: I’m running the following simple Hive QL: INSERT OVERWRITE TABLE SELECT ,… FROM ; [HiveServer2-Background-Pool: Thread-20]: ERROR jdbc.JDBCStatsPublisher: Error during JDBC initializatio

How to set idle Hive jdbc connection out from java code using hive jdbc

2015-12-01 Thread reena upadhyay
I am using hive jdbc 1.0 in my java application to create connection with hive server and execute query. I want to set the idle hive connection timeout from java code. Like say, user first creates the hive connection, and if the hive connection remains idle for next 10 minutes, then this connection

Re: 答复: tunning guide

2015-12-01 Thread Artem Ervits
Here you go http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_performance_tuning/content/ch_hive_architectural_overview.html On Dec 1, 2015 4:25 AM, "San Luo" wrote: > Gotta ,thanks > > > > *发件人:* Srinivas Thunga [mailto:srinivas.thu...@gmail.com] > *发送时间:* 2015年12月1日 17:09 > *收件人:* user

答复: UPDATE RE: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes (beeline - hive server 2)

2015-12-01 Thread San Luo
Thanks for response, To Timothy I tried to set this global setting ,however message comes with “ERROR 1193 (HY000): Unknown system variable 'innodb_large_prefix' ”. Here are mysql version and so on: [root@master hive]# rpm -qa | grep mysql mysql-5.1.73-5.el6_6.x86_64 mysql-libs

RE: Problem with getting start of Hive on Spark

2015-12-01 Thread Mich Talebzadeh
Hi Link, I am afraid it seems that using Spark as the execution engine for Hive does not seem to work. I am still trying to make it work. An alternative is to use Spark with Hive data set. To be precise spark-sql. You set spark to use Hive metastore and then use Hive as the heavy DML engine

答复: tunning guide

2015-12-01 Thread San Luo
Gotta ,thanks 发件人: Srinivas Thunga [mailto:srinivas.thu...@gmail.com] 发送时间: 2015年12月1日 17:09 收件人: user@hive.apache.org 主题: Re: tunning guide Hi, Use some query optimization techniques for query fast execution. Use, Partitions, Bucketing or create table in ORC format Regards,

Re: tunning guide

2015-12-01 Thread Srinivas Thunga
Hi, Use some query optimization techniques for query fast execution. Use, Partitions, Bucketing or create table in ORC format Regards, Srinivas T *Thanks & Regards,* *Srinivas T* On Tue, Dec 1, 2015 at 1:46 PM, San Luo wrote: > Hi guys, > > My query runs slowly in hive, is there a tuning g

tunning guide

2015-12-01 Thread San Luo
Hi guys, My query runs slowly in hive, is there a tuning guide or document similar that could share some ideas in this? Thanks.