beeline client

2014-07-10 Thread Bogala, Chandra Reddy
Hi, Currently I am submitting multiple hive jobs using hive cli with "hive -f" from different scripts. All these jobs I could see in application tracker and these get processed in parallel. Now I planned to switch to HiveServer2 and submitting jobs using beeline client from multiple scripts

Re: Issue while running Hive 0.13

2014-07-10 Thread Sarath Chandra
I'm using Hadoop 1.0.4. Suspecting some compatibility issues I moved from Hive 0.13 to Hive 0.12. But the exceptions related to SL4J still persist. Unable to move forward with hive to finalize a critical product design. Can somebody please help me? On Wed, Jul 9, 2014 at 11:25 AM, Sarath Chandra

Re: Hive UDF performance issue

2014-07-10 Thread Edward Capriolo
The "small" table can be any size. You want the small table to be /path/to/table/b here because that will result in more parallelism. There is a ticket on hive theta join that you might want to look at. On Thu, Jul 10, 2014 at 10:23 PM, Malligarjunan S wrote: > Hello Edwards, > > Thank you very

Re: Hive UDF performance issue

2014-07-10 Thread Malligarjunan S
Hello Edwards, Thank you very much for the update. What size you mean is small table. In our case the small table will have minimum of 1 million records. Can we use this UDTF? how much time improvement will be there? Appreciate your help! Thanks and Regards SankarS On Thu, Jul 10, 2014 at 11:26

More recent Hive-Hbase Integration info/docs

2014-07-10 Thread Stephen Boesch
The url for the hbase-hive integration: https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration has old versions: Hbase 0.92.0 and hadoop 0.20.x Are there any significant changes to these docs that anyone might (a) have pointers to or (b) be able/willing to mention here as important

Re: GetTables is very slow when the table number is large

2014-07-10 Thread Ashu Pachauri
There is a max limit size for the number of tables (maxRows) but it does not help because the call internally fetches all the table objects and then truncates the result set based on the maxRows argument. So, it is dependent only on the total number of tables your database has. I have also been ex

Re: Hive job scheduling

2014-07-10 Thread moon soo Lee
for simpler use, Zeppelin (http://zeppelin-project.org) runs hive query with web based editor, and it's got cron tab style scheduler. Best, moon On Fri, Jul 11, 2014 at 8:52 AM, Martin, Nick wrote: > Oozie has a workflow action for Hive to execute scripts. You can also > configure an Oozie co

Re: Hive job scheduling

2014-07-10 Thread Martin, Nick
Oozie has a workflow action for Hive to execute scripts. You can also configure an Oozie coordinator to run the Hive workflow at desired intervals. Lots of Oozie config options for workflows and configs so check out the documentation. Sent from my iPhone On Jul 10, 2014, at 6:09 PM, "Cheng Ju C

Re: hiveserver2 0.12 and 0.13 incompatible?

2014-07-10 Thread Xuefu Zhang
Yeah. It's expected that 13 client is not able to talk to the older sever. However, the other direction is fine. That is, old 12 client should be able to talk to 13 server. --Xuefu On Thu, Jul 10, 2014 at 3:09 PM, Edward Capriolo wrote: > 2014-07-10 22:00:03 ERROR HiveConnection:425 - Error op

Multiple joins cause failures in Reduce phase

2014-07-10 Thread diogo
So, I have a query like this: select user.id ud_name.value as name ud_age.value as age from user left outer join user_data ud_name on user.id = ud_name.user_id and ud_name.key = 'name' left outer join user_data ud_age on user.id = ud_age.user_id and ud_age.key = 'age' ... ; With multiple joins

Hive job scheduling

2014-07-10 Thread Cheng Ju Chuang
Hi, Hopefully it won't be an extra mail for you guys because I keep getting deliver error. I am looking for any scheduling implementation for Hive job. (e.g. some hive command have to be executed every 15 minutes.) It is supposed to be some ways to achieve it but I haven't find a stable way fr

hiveserver2 0.12 and 0.13 incompatible?

2014-07-10 Thread Edward Capriolo
2014-07-10 22:00:03 ERROR HiveConnection:425 - Error opening session org.apache.thrift.TApplicationException: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null) at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) This is hive 13

Hive job scheduling

2014-07-10 Thread Cheng Ju Chuang
Hi, I am looking for any scheduling implementation for Hive job. (e.g. some hive command have to be executed every 15 minutes.) It is supposed to be some ways to achieve it but I haven't find a stable way from online community. Any suggestion? Btw, my hive version is 0.13 and Hadoop version is

Re: beeline remote client not connecting to hiveserver2

2014-07-10 Thread D K
What do you see in your hiveserver2 logs? There might be a clue there. On Thu, Jul 10, 2014 at 1:17 PM, Hang Chan wrote: > I tried using the username and password but still getting the same error. > > # hive --service beeline --verbose=true -u jdbc:hive2://hiveservice:11000 > -n root -p foo >

Re: beeline remote client not connecting to hiveserver2

2014-07-10 Thread Hang Chan
I tried using the username and password but still getting the same error. # hive --service beeline --verbose=true -u jdbc:hive2://hiveservice:11000 -n root -p foo issuing: !connect jdbc:hive2://hiveservice:11000 root foo scan complete in 36ms Connecting to jdbc:hive2://hiveservice:11000 Error: Inv

Re: beeline remote client not connecting to hiveserver2

2014-07-10 Thread D K
Oh, somewhere in the email thread I thought http transport mode was being used. If that's not the case then you should be able to login using: hive --service beeline -u jdbc:hive2://hiveservice:11000 -n $USER -p fakepwd Even though it doesn't do authentication, hiveserver2 still needs to a usernam

Re: Hive UDF performance issue

2014-07-10 Thread Edward Capriolo
There is no magic. Hopefully one table is smaller then the other. You could make a UDTF to do something like this MR job is doing Make a mapper that runs over table A. InputFormat.setInputPath("/path/to/table/a") Then inside the mapper private Conf c setup(Conf c){ this.c = c } public void map

???? FAILED: SemanticException [Error 10084]: Stateful UDF's can only be invoked in the SELECT list

2014-07-10 Thread Clay McDonald
Why is this failing? I am calling it in the SELECT list. hive> ADD JAR /root/apache-hive-0.13.1-bin/lib/hive-contrib-0.13.1.jar; hive> CREATE TEMPORARY FUNCTION rowSequence AS 'org.apache.hadoop.hive.contrib.udf.UDFRowSequence'; hive> CREATE TABLE LOYALTY_CARDS AS > SELECT DISTINCT CARD_NB

Re: Hive UDF performance issue

2014-07-10 Thread Malligarjunan S
Hello Edward, Thank you very much for helping me. I am new to hive. Could you please provide the sample map reduce job? Regards, Sankar S On Thu, Jul 10, 2014 at 8:19 AM, Edward Capriolo wrote: > Hive cross product stinks . I have a map reduce job that will do it > > > On Wednesday, July 9

Re: beeline remote client not connecting to hiveserver2

2014-07-10 Thread Hang Chan
Nope, still not working. I don't believe I have http enabled. # hive --service beeline --verbose=true -u "jdbc:hive2://hiveservice:10001/default?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice" issuing: !connect jdbc:hive2://hiveservice:10001/default?hive.server2.transpo

Auto scaling on ec2

2014-07-10 Thread Raju Chinthala
Scaling a Hadoop cluster with Hive has the following issues 1. Adding a computing node(Scaling up) when load on the cluster is high decreases the execution time of the queries but its there is still a huge time lag as the new node works on data from other nodes. 2. The process of removing a node