Re: External Table

2010-06-27 Thread Amr Awadallah
alter table add partition should work see: http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL -- amr On 6/26/2010 3:15 AM, Shuja Rehman wrote: Hi I have a Questions about external tables Suppose I have this kind of directory structure in hdfs. /user/Mydata/Data1/part-000 /user/Mydata/Dat

Re: MapSide join in Hive

2010-06-26 Thread Amr Awadallah
Viraj, 1. No 2. Yes, smaller table needs to fit in jvm memory (typically more than 1GB for small table is too large). See slide 7 and after in this preso for different join strategies that can help in case the tables are bucketed and sorted. http://www.slideshare.net/zshao/hive-user-meeting

Re: Create Table with Line Terminated other than '\n'

2010-06-11 Thread Amr Awadallah
Zheng, I thought that was fixed per you work here, no? https://issues.apache.org/jira/browse/HIVE-302 Then what did you fix? -- amr On 6/10/2010 10:22 PM, Zheng Shao wrote: Also, changing "LINES TERMINATED BY" probably won't work, because hadoop's TextInputFormat does not allow line terminato

Re: Carriage return handling in Hive

2010-06-08 Thread Amr Awadallah
Akira, did you try LINES TERMINATED BY ? -- amr On 6/3/2010 6:45 AM, Akira Kitada wrote: Hi, Hive uses TextInputFormat by default and which treats '\n' AND '\r' as a line separator. However I don't want '\r' to be treated as a separator. Does Hive provide a way to set custom InputFormat? I

Re: Does Hive can run normally in windows platform ?

2010-05-11 Thread Amr Awadallah
I use the hive client under cygwin fine. -- amr On 5/11/2010 9:06 AM, Edward Capriolo wrote: On Tue, May 11, 2010 at 11:32 AM, Jeff Zhang > wrote: Hi , I notice that there's some code in hive for handling windows path, but it seems Hive can not run norm

Re: regex_extract in hive

2010-03-08 Thread Amr Awadallah
remove the leading and trailing /, no need for those -- amr On 3/8/2010 8:31 AM, prakash sejwani wrote: Hi All, i have a query below SELECT regexp_extract(resource,'/\&tag=([^\&]+)/') FROM a_log; it gives black result the sample resource string is like this "/se

Re: Hive join strategy documentation?

2010-02-25 Thread Amr Awadallah
thanks Yongqiang, please also include the hints (MAPJOIN, STREAMBASE). -- amr On 2/25/2010 7:53 PM, Yongqiang He wrote: Hi, Thanks. Good suggestion. Had a short discussion this afternoon. I will update the wiki once the sort merge bucket map join and another skew join done. And will send out

Re: Why no two aggregations can have different DISTINCT columns ?

2010-02-25 Thread Amr Awadallah
+1, please post jira/patch. -- amr On 2/25/2010 1:20 AM, Zheng Shao wrote: Yes definitely. Do you want to open a JIRA and post a patch? Please link the new JIRA to the other 2 JIRA that was mentioned in the same email thread. Zheng On Thu, Feb 25, 2010 at 1:16 AM, Mafish Liu wrote: Hive

Re: IN() Operator

2010-02-19 Thread Amr Awadallah
Andy, Are you trying to do something like: SELECT FROM mytable A WHERE AND mycol IN ( SELECT ) If so, you can't do sub-queries inside the WHERE clause in Hive, you can only do sub-queries within the FROM/JOIN clause. But, almost any query similar to above can be written using

Re: comments in hive ql

2010-02-16 Thread Amr Awadallah
-- On 2/16/2010 9:03 PM, prasenjit mukherjee wrote: How do I add a commented line in hive ql file ? -- or # or //

Re: Read-only users in Hive

2010-01-21 Thread Amr Awadallah
sions for the Hive warehouse directory. / Oscar On Tue, Jan 19, 2010 at 11:40 PM, Amr Awadallah <mailto:a...@cloudera.com>> wrote: HIVE-78 is what we all are waiting for :) The hack you suggest below should be a valid interim solution, just make sure the read-only clients

Re: Read-only users in Hive

2010-01-19 Thread Amr Awadallah
HIVE-78 is what we all are waiting for :) The hack you suggest below should be a valid interim solution, just make sure the read-only clients have their own hive-site.xml with the proper user/pass for the read only account, e.g. javax.jdo.option.ConnectionUserName read-only-user javax.jdo

Re: Help with regexp_extract sytax

2010-01-05 Thread Amr Awadallah
; throwing it off, try: select to_date(ts), regexp_extract(cookies, '(urltrack__goo.*?\;)', 1) from hits where dt='2009-04' and lower(cookies) rlike '(.*)urltrack__goo(.*)' limit 5; On 1/5/2010 2:19 AM, Saurabh Nanda wrote: Hi, I'm trying the following query but it doesn't seem to be syntacti

Re: How to define a cube and do drill-down operation on hive?

2009-11-13 Thread Amr Awadallah
> It is not supported currently That isn't entirely true, because the question it self is also a bit ambiguous. BI tools (ala Microstrategy, Business Objects, Cognos) are the ones that do the OLAP functionality (drill-down/drill-up) then they pass that as SQL to the underlying database (with the

Re: lead and lag analytic functions

2009-10-21 Thread Amr Awadallah
I am +1 to adding such functions to Hive, opened a JIRA at: https://issues.apache.org/jira/browse/HIVE-896 There is no straightforward way right now AFAIK. You can simulate the LAG function using a map transform, just buffer the last n rows that you care about in memory as a sliding window, i

Re: UPDATE statement in Hive?

2009-07-28 Thread Amr Awadallah
Saurabh, I think you better off with HBase for this kind of use, see: http://hadoop.apache.org/hbase/ In a nutshell, HBase is a layer on top of HDFS which supports two things: (1) quick lookups based on keys (e.g. a userid), and (2) transaction semantics at the row-level (update/delete/insert

Re: Regarding Hive

2009-07-07 Thread Amr Awadallah
I have Hive running with cygwin and windows, you need to apply this patch though: https://issues.apache.org/jira/browse/HIVE-344 It will work fine except if you use AUX_PARAMs which I still have to find the time to fix. The other options for running on windows are: * Run a Linux virtual m

Re: many terms in group by

2009-07-04 Thread Amr Awadallah
> select a,b,c,d,e,f,g,h,i,count(*) from table x group by a,b,c,d,e,f,g,h,i; yes, should work, please try it and let us know. -- amr tim robertson wrote: Hi all, I have several MapReduce jobs that are basically doing counts with group by on tab delimited files. Getting tired of writing the s

Re: aggregations over multiple columns?

2009-07-04 Thread Amr Awadallah
Mike, This is a valid query, group by over multiple columns works in hive. -- amr Michael E. Driscoll wrote: Hi HIVErs, I'm trying to perform the following aggregation query in HIVE, which finds the largest purchase for all combinations of customer and store: SELECT customer, store, max(p

Re: distinct with union all

2009-07-02 Thread Amr Awadallah
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. -

Re: Set difference in Hive

2009-06-29 Thread Amr Awadallah
do an outer join on user and filter on name.user is null -- amr Rakesh Setty wrote: Hi, I am new to Hive. I would like to know what is the easiest way to get the difference between two sets. For example, how can I convert the following SQL query to Hive? select user fr

Re: OutOfMemory when doing map-side join

2009-06-17 Thread Amr Awadallah
hmm, that is a 100KB per my math. 20K * 100K = 2GB -- amr Ashish Thusoo wrote: That does not sound right. Each row is 100MB - that sounds too much... Ashish *From:* Min Zhou [mailto:coderp...@gmail.com] *Sent:* Monday

Re: Built - In Aggregate Function - Standard Deviation

2009-05-31 Thread Amr Awadallah
at has an aggregate function call again in the same select statement. This is much cleaner that the approach that I took. I'll give it a shot. Thanks again. On Wed, May 27, 2009 at 4:24 AM, Amr Awadallah mailto:a...@cloudera.com>> wrote: I agree that a buil

Re: Built - In Aggregate Function - Standard Deviation

2009-05-27 Thread Amr Awadallah
I agree that a builtin for std dev is a good idea. that said, you can achieve this easy in one pass, just use: select sum( pow(col,2) ) as totsqr, sum( col ) as tot, count(1) as n, pow( (n*totsqr - pow(tot,2) )/(n*(n-1)), 0.5) as stddev from Matt Pestritto wrote: Hi. Are there plans to

Re: Is it possible hiveserver both be a server and a client of itself?

2009-05-13 Thread Amr Awadallah
I don't think that would work, where would you store the meta-data for the meta-data? Min Zhou wrote: Hi Ashish, Thank you for your swift reply. I guess it's HiveServer code inherit the metastore thrift api. HiveServer provide metadata service by itself, not call another metastore server,

Re: [ANNOUNCE] Hive release 0.3.0 available

2009-04-30 Thread Amr Awadallah
> Good work, thanks! Indeed, very happy to see Hive growing up. -- amr Min Zhou wrote: Good work, thanks! On Fri, May 1, 2009 at 1:35 AM, Ashish Thusoo > wrote: The first official release of Hive is available For Hadoop release details and downloads,

Re: Aggregrate Query Fails.

2009-04-22 Thread Amr Awadallah
in the group by, try this instead: *group by m.description, buyers* limit 40 ; Matt Pestritto wrote: Hi - I'm having a problem with a query below. When I try to run any aggregate function on a column from the sub-query, the job fails. The queries and output messages are below. Suggestions

Re: hive sub query support

2009-03-30 Thread Amr Awadallah
No need for intermediate table. Hive supports sub-queries in the FROM clause (and not in the WHERE clause), so your query can be re-written as: SELECT col1, col2 FROM table1 JOIN ( SELECT col3, COUNT(col1) AS c1 FROM table1 GROUP BY col3 SORT BY c1 LIMIT 15 ) myt ON table1.c3 = myt.c3;