Re: Re: bz2 Splits.

2009-07-28 Thread Saurabh Nanda
> That you for the wiki page on this. Keep up the good work and please > post all your findings about compression. Many people (including me) > will benefit from an explanation about the different types of > compression available and the trade offs of different codecs and > options. Thanks, Edw

Re: UPDATE statement in Hive?

2009-07-28 Thread Saurabh Nanda
Sorry for the newbie questions here, but how is this going to work? Using 'normal' Hive queries will I be able to read & write to an HBase datastore? >From withing the Hive CLI? Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com

Re: UPDATE statement in Hive?

2009-07-28 Thread Abhijit Pol
+1 if need more support for this feature. I think this will be very powerful and useful addition to HIVE. 2009/7/28 He Yongqiang : > Talked with Samuel Guo, and I am sure he will work on it soon. > > On 09-7-29 上午10:15, "Ashish Thusoo" wrote: > > That would be great Youngqiang. > > Amr, we don't

Re: partitions not being created

2009-07-28 Thread Zheng Shao
Can you send the output of these 2 commands? describe extended ApiUsage; describe extended ApiUsageTemp; Zheng On Tue, Jul 28, 2009 at 6:29 PM, Bill Graham wrote: > Thanks for the tip, but it fails in the same way when I use a string. > > On Tue, Jul 28, 2009 at 6:21 PM, David Lerman wrote: >>

Re: UPDATE statement in Hive?

2009-07-28 Thread He Yongqiang
Talked with Samuel Guo, and I am sure he will work on it soon. On 09-7-29 上午10:15, "Ashish Thusoo" wrote: > That would be great Youngqiang. > > Amr, we don't have that kind of support but would love to add it. > > Ashish > > > From: He Yongqiang [mailto:heyongqi...@software.ict.ac.cn] > Se

RE: UPDATE statement in Hive?

2009-07-28 Thread Ashish Thusoo
That would be great Youngqiang. Amr, we don't have that kind of support but would love to add it. Ashish From: He Yongqiang [mailto:heyongqi...@software.ict.ac.cn] Sent: Tuesday, July 28, 2009 7:03 PM To: hive-user@hadoop.apache.org Subject: Re: UPDATE statement

Re: UPDATE statement in Hive?

2009-07-28 Thread He Yongqiang
The patch contributor of https://issues.apache.org/jira/browse/PIG-6 is a student here in our institute, but another laboratory. If hive is interested in this, I will get in touch with him to see if he would like to do a similar contribution for hive. On 09-7-29 上午8:10, "Peter Skomoroch" wrote:

Re: partitions not being created

2009-07-28 Thread Bill Graham
Thanks for the tip, but it fails in the same way when I use a string. On Tue, Jul 28, 2009 at 6:21 PM, David Lerman wrote: > >> hive> create table partTable (a string, b int) partitioned by (dt int); > > > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") > > SELECT `(requestDate)?+.+`

Re: partitions not being created

2009-07-28 Thread David Lerman
>> hive> create table partTable (a string, b int) partitioned by (dt int); > INSERT OVERWRITE TABLE ApiUsage PARTITION (dt = "20090518") > SELECT `(requestDate)?+.+` FROM ApiUsageTemp WHERE requestDate = '2009/05/18' The table has an int partition column (dt), but you're trying to set a string va

Re: partitions not being created

2009-07-28 Thread Bill Graham
I see now. Show partitions shows the partitions loaded into the table, not the metadata about what columns are partitions. That makes sense. I'm trying to load the data using a select from an un-partitioned table into another partitioned table, which I suspect could be my problem. Is this not supp

RE: partitions not being created

2009-07-28 Thread Namit Jain
There are no partitions in the table - Can you post the output you get while loading the data ? From: Bill Graham [mailto:billgra...@gmail.com] Sent: Tuesday, July 28, 2009 5:54 PM To: hive-user@hadoop.apache.org Subject: partitions not being created Hi, I'm trying to create a partitioned table

partitions not being created

2009-07-28 Thread Bill Graham
Hi, I'm trying to create a partitioned table and the partition is not appearing for some reason. Am I doing something wrong, or is this a bug? Below are the commands I'm executing with their output. Note that the 'show partitions' command is not returning anything. If I were to try to load data in

Re: UPDATE statement in Hive?

2009-07-28 Thread Peter Skomoroch
+1 for Hive queries on HBase - that would be a powerful combination. On Tue, Jul 28, 2009 at 8:05 PM, Amr Awadallah wrote: > Saurabh, I think you better off with HBase for this kind of use, see: > > http://hadoop.apache.org/hbase/ > > In a nutshell, HBase is a layer on top of HDFS which supports

Re: UPDATE statement in Hive?

2009-07-28 Thread Amr Awadallah
Saurabh, I think you better off with HBase for this kind of use, see: http://hadoop.apache.org/hbase/ In a nutshell, HBase is a layer on top of HDFS which supports two things: (1) quick lookups based on keys (e.g. a userid), and (2) transaction semantics at the row-level (update/delete/insert

Re: counting different regexes in a single pass

2009-07-28 Thread Andraz Tori
Thanks for the counting solution! Zhao, I've uploaded the S3 log parser to HIVE-693. Among other things, I also noticed a Hive bug today: when using hive in server mode (via python) to import 400 different partitions one after another, datanode started reporting "too many open files" errors in it

Re: Hive can't see a DFS file

2009-07-28 Thread Vijay Kumar Adhikari
Attached. On Tue, Jul 28, 2009 at 2:17 PM, Ashish Thusoo wrote: > Can you put out the contents of > > /tmp//hive.log > > From the machine where you launched the hive cli from. > > Ashish > > -Original Message- > From: Vijay Kumar Adhikari [mailto:vijay...@gmail.com] > Sent: Tuesday, July 2

Re: Best way to duplicate a table?

2009-07-28 Thread Zheng Shao
If there are multiple partition keys: INSERT OVERWRITE ABC_COPY PARTITION(partkey1 = "$partkey1", partkey2 = "$partkey2") SELECT `(partkey1|partkey2)?+.+` FROM ABC WHERE partkey = "$partkey"; See https://issues.apache.org/jira/browse/HIVE-420 for details on this. If you already know the name of

Re: Best way to duplicate a table?

2009-07-28 Thread Zheng Shao
You can do something like this: CREATE table ABC_COPY LIKE ABC; SHOW PARTITIONS ABC; for each partition: INSERT OVERWRITE ABC_COPY PARTITION(partkey = "$partkey") SELECT `(partkey)?+.+` FROM ABC WHERE partkey = "$partkey"; That back-quoted `(partkey)?+.+` is a special regular expre

Best way to duplicate a table?

2009-07-28 Thread Jason Michael
I'd like to duplicate a very large, partitioned table in Hive, preserving all data and partitions. What's the most efficient way to do this?

RE: Hive can't see a DFS file

2009-07-28 Thread Ashish Thusoo
Can you put out the contents of /tmp//hive.log >From the machine where you launched the hive cli from. Ashish -Original Message- From: Vijay Kumar Adhikari [mailto:vijay...@gmail.com] Sent: Tuesday, July 28, 2009 7:42 AM To: hive-user@hadoop.apache.org Subject: Hive can't see a DFS

Re: Re: bz2 Splits.

2009-07-28 Thread Edward Capriolo
On Tue, Jul 28, 2009 at 11:02 AM, Edward Capriolo wrote: > On Tue, Jul 28, 2009 at 2:22 AM, Zheng Shao wrote: >> Yes we do compress all tables. >> >> Zheng >> >> On Mon, Jul 27, 2009 at 11:08 PM, Saurabh Nanda >> wrote: >>> In our setup, we didn't change io.seqfile.compress.blocksize (1MB) an

Re: Re: bz2 Splits.

2009-07-28 Thread Edward Capriolo
On Tue, Jul 28, 2009 at 2:22 AM, Zheng Shao wrote: > Yes we do compress all tables. > > Zheng > > On Mon, Jul 27, 2009 at 11:08 PM, Saurabh Nanda wrote: >> >>> In our setup, we didn't change io.seqfile.compress.blocksize (1MB) and >>> it's still fairly good. >>> You are free to try 100MB for better

Hive can't see a DFS file

2009-07-28 Thread Vijay Kumar Adhikari
Apparently, hive can't access the temporary file or something. My SQLs fail with an IPException, + hive> select 1 from netflow; Total MapReduce jobs = 1 Number of reduce tasks is set to 0 since there's no reduce operator Job Submission failed with exception 'java.io.IOException(cannot find

RE: UPDATE statement in Hive?

2009-07-28 Thread Ashish Thusoo
There is no update statement at this time and as there is no update of a file in hadoop and update in Hive though possible would just be syntax sugar for merging the new values to the old data in the table and then rewriting the table with the merged output. This can be achieved by doing an inse

UPDATE statement in Hive?

2009-07-28 Thread Saurabh Nanda
Is there an UPDATE statement in Hive? If not, are there any plans for adding support for it in the future? This is why I ask: I want to maintain a table which, against each user ID, stores the first visit & last visit time. This is across the entire year, not a day -- basically to understand how m