Re: IN() Operator

2010-02-19 Thread Amr Awadallah
Andy, Are you trying to do something like: SELECT FROM mytable A WHERE AND mycol IN ( SELECT ) If so, you can't do sub-queries inside the WHERE clause in Hive, you can only do sub-queries within the FROM/JOIN clause. But, almost any query similar to above can be written using

Re: [VOTE] hive 0.5.0 release candidate 0

2010-02-19 Thread Edward Capriolo
On Fri, Feb 19, 2010 at 9:49 PM, Zheng Shao wrote: > Hi, > > I just made a release candidate at > https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.5.0-rc0 > > The tarballs are at: http://people.apache.org/~zshao/hive-0.4.1-candidate-3/ > > > Please vote. > > -- > Yours, > Zheng > -1 I

[VOTE] hive 0.5.0 release candidate 0

2010-02-19 Thread Zheng Shao
Hi, I just made a release candidate at https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.5.0-rc0 The tarballs are at: http://people.apache.org/~zshao/hive-0.4.1-candidate-3/ Please vote. -- Yours, Zheng

Re: Hive Server Leaking File Descriptors?

2010-02-19 Thread Zheng Shao
https://issues.apache.org/jira/browse/HIVE-1181 is committed. Thanks to Yonqqiang. We need to "-hiveconf hive.fileformat.check=false" when starting HiveServer to get rid of the extra connections. We still need to fix MAPREDUCE-1504 so that we can re-enable fileformat check. Zheng On Thu, Feb 18

Can't start HWI service in trunk

2010-02-19 Thread Brent Miller
Hello, I'm trying to get the hive HWI service up and running. I'm on r911664 and when I try to start the HWI service I get: h...@hadoop-master:~$ hive --service hwi ls: cannot access /opt/hive/lib/hive-hwi-*.war: No such file or directory 10/02/19 14:57:42 INFO hwi.HWIServer: HWI is starting up 10

Re: Having trouble with lateral view

2010-02-19 Thread Zheng Shao
Jason, Do you want to open a JIRA and contrib your map_explode function to Hive? That will be greatly appreciated. Zheng On Fri, Feb 19, 2010 at 2:49 PM, Yongqiang He wrote: > Hi Jason, > > This is a known bug, see https://issues.apache.org/jira/browse/HIVE-1056 > > You can first disable ppd w

Re: Having trouble with lateral view

2010-02-19 Thread Yongqiang He
Hi Jason, This is a known bug, see https://issues.apache.org/jira/browse/HIVE-1056 You can first disable ppd with ³set hive.optimize.ppd=false;² Thanks Yongqiang On 2/19/10 2:23 PM, "Jason Michael" wrote: > I¹m currently running a hive build from trunk, revision number 911889. I¹ve > built a

Having trouble with lateral view

2010-02-19 Thread Jason Michael
I'm currently running a hive build from trunk, revision number 911889. I've built a UDTF called map_explode which just emits the key and value of each entry in a map as a row in the result table. The table I'm running it against looks like: hive> describe mytable; productstringfrom de

Re: computing median and percentiles

2010-02-19 Thread Zheng Shao
Hi Jerome, Is there any update on this? https://issues.apache.org/jira/browse/HIVE-259 Zheng On Fri, Feb 5, 2010 at 9:34 AM, Jerome Boulon wrote: > Hi Bryan, > I'm working on Hive-259. I'll post an update early next week. > /Jerome. > > > On 2/4/10 9:08 PM, "Bryan Talbot" wrote: > >> What's th

Re: map join and OOM

2010-02-19 Thread Gang Luo
Hi Yongqiang, that sounds interesting. So, when you mention *hive-chunk*, do you mean *bucket* in database concept? Can I further say that hive is actually doing what database does in a hash join (divide the hash table into several buckets, load all the buckets in turn and join them with a blo

Re: Question on modifying a table to become external

2010-02-19 Thread Eva Tse
Yep, we just tried it and it works. Thanks! Eva. On 2/19/10 10:25 AM, "Prasad Chakka" wrote: > Eva, > > Here is the wiki describing the syntax on changing table properties. > http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Add_Table_Properties > > > > From: Eva Tse > Reply-To: > Dat

Re: How does Hive determine the number of mapred tasks?

2010-02-19 Thread Edward Capriolo
The maximum number of tasks running at once per node is dictated by mapred.tasktracker.map.tasks.maximum 6 mapred.tasktracker.reduce.tasks.maximum 4 I do not work with ec2 so I do not know if how to adjust it. Hive prints a message like this during the query. Number of reduce tasks not

How does Hive determine the number of mapred tasks?

2010-02-19 Thread Saurabh Nanda
Hi, Is there any page/document that describes the methods/techniques used by Hive to arrive at the optimum number of map tasks & optimum number of reduce tasks? I'm running a 3-node Amazon EMR cluster, and Hive has determined that 34 map & 2 reduce tasks are optimum. Out of the 34 map tasks only

Re: Question on modifying a table to become external

2010-02-19 Thread Prasad Chakka
Eva, Here is the wiki describing the syntax on changing table properties. http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL#Add_Table_Properties From: Eva Tse Reply-To: Date: Thu, 18 Feb 2010 15:54:01 -0800 To: , Zheng Shao , Paul Yang Subject: Re: Quest

Re: Thrift Server Error Messages

2010-02-19 Thread Zheng Shao
Can you open a JIRA and help propose some concrete design of the change? That will help make it faster to have this feature. Thanks, Zheng On Fri, Feb 19, 2010 at 6:17 AM, Andy Kent wrote: > When executing commands on the hive command line it give really useful output > if you have syntax error

Re: SequenceFile compression on Amazon EMR not very good

2010-02-19 Thread Zheng Shao
hive.exec.compress.output controls whether or not to compress hive output. (This overrides mapred.output.compress in Hive). All other compression flags are from hadoop. Please see http://hadoop.apache.org/common/docs/r0.18.0/hadoop-default.html Zheng On Fri, Feb 19, 2010 at 5:53 AM, Saurabh Nand

Re: IN() Operator

2010-02-19 Thread Vladimir Klimontovich
No, but it's relatively easy to implement custom UDF (http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF) that will implement the same functionality On Feb 19, 2010, at 7:58 PM, Andy Kent wrote: > I couldn't find anything on the wiki so thought I would try here. > > Does Hive have an IN()

IN() Operator

2010-02-19 Thread Andy Kent
I couldn't find anything on the wiki so thought I would try here. Does Hive have an IN() operator similar to in MySQL? If not then is there an alternative way of testing for inclusion? Thanks, Andy.

Thrift Server Error Messages

2010-02-19 Thread Andy Kent
When executing commands on the hive command line it give really useful output if you have syntax errors in your query. When using the Thrift interface I seem to only be able to get errors like 'Error code: 11'. Is there a way to get at the human friendly error messages via the thrift interface?

RE: Hive Server Leaking File Descriptors?

2010-02-19 Thread Andy Kent
This has made my day. Thanks for working through this Bennie and Zheng. Andy. From: Bennie Schut [bsc...@ebuddy.com] Sent: 19 February 2010 07:47 To: hive-user@hadoop.apache.org Subject: Re: Hive Server Leaking File Descriptors? That's some great news. Th

Re: SequenceFile compression on Amazon EMR not very good

2010-02-19 Thread Saurabh Nanda
And also hive.exec.compress.*. So that makes it three sets of configuration variables: mapred.output.compress.* io.seqfile.compress.* hive.exec.compress.* What's the relationship between these configuration parameters and which ones should I set to achieve a well compress output table? Saurabh.

Re: SequenceFile compression on Amazon EMR not very good

2010-02-19 Thread Saurabh Nanda
I'm confused here Zheng. There are two sets of configuration variables. Those starting with io.* and those starting with mapred.*. For making sure that the final output table is compressed, which ones do I have to set? Saurabh. On Fri, Feb 19, 2010 at 12:37 AM, Zheng Shao wrote: > Did you also: