RE: merging the size of the reduce output

2010-07-01 Thread John Sichi
Ning is currently out on vacation; I think he'll be back to working on this when he returns. JVS From: Viraj Bhat [vi...@yahoo-inc.com] Sent: Thursday, July 01, 2010 11:40 PM To: hive-user@hadoop.apache.org Subject: RE: merging the size of the reduce outp

RE: merging the size of the reduce output

2010-07-01 Thread Viraj Bhat
Okay I read that this is a work in progress https://issues.apache.org/jira/browse/HIVE-1307 to deal with small files when doing dynamic partitioning. There was a suggestion to try: hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat for Hadoop 20 when running queries on this p

RE: merging the size of the reduce output

2010-07-01 Thread Viraj Bhat
Hi Yongqiang, I am facing a similar situation, I am using the latest trunk of Hive. I am using dynamic partitioning of Hive and it is a Map only job, which converts files from compressed TXT gz to RC format. The DDL of the task looks similar to: FROM gztable INSERT OVERWRITE TABLE rctab

Re: Schema evolution?

2010-07-01 Thread Yang
Paul: thanks. currently I do not need this feature from Hive QL, just need it in metastore. you said "There exists structures for supporting this in the metastore", could you please give more details? I suppose the interface to metastore is basically classes like Table, Partition, but in the Par

RE: Schema evolution?

2010-07-01 Thread Paul Yang
There exists structures for supporting this in the metastore, but that feature isn't in Hive yet. For example, although the metadata for partitions include its own set of columns, parts of the code in the query processor still read from table level metadata. Some evolution can occur in the form

Schema evolution?

2010-07-01 Thread Yang
I read on the VLBD hive paper "Hive - A Warehousing Solution Over a Map-ReduceFramework" that Partitions could have different schemas : (section 3.1 MetaStore) " Partition - Each partition can have its own columns and SerDe and storage information. This can be used in the future to support sche

Re: Hive-Hbase Key lookup w/o full scan

2010-07-01 Thread Ray Duong
Thanks John, Can you provide me with some pointers?. My team can try to work on it. Our workaround right now is to call the Thrift API from within Hive using a UDF. Thanks, -ray On Thu, Jul 1, 2010 at 1:19 PM, John Sichi wrote: > On Jul 1, 2010, at 10:36 AM, Ray Duong wrote: > > > Is there

Re: Using the same InputFormat class for JOIN?

2010-07-01 Thread John Sichi
Take a look at [Combine]HiveInputFormat; they are what we wrap around your input formats in order to allow Hive to access data from multiple input formats in the same job. JVS On Jul 1, 2010, at 10:16 AM, yan qi wrote: sHi, Namit, Thanks a lot for your reply! I checked the source code. G

Re: Hive-Hbase Key lookup w/o full scan

2010-07-01 Thread John Sichi
On Jul 1, 2010, at 10:36 AM, Ray Duong wrote: > Is there away to do a hbase key lookup using the Hive-Hbase integration > without doing a full scan? > > Since I'm specifying the key='foo' in the where condition, shouldn't it be a > fast lookup? Hi Ray, Pushing down filters to HBase is one of

Hive-Hbase Key lookup w/o full scan

2010-07-01 Thread Ray Duong
Is there away to do a hbase key lookup using the Hive-Hbase integration without doing a full scan? Since I'm specifying the key='foo' in the where condition, shouldn't it be a fast lookup? thanks, -ray

Re: Using the same InputFormat class for JOIN?

2010-07-01 Thread yan qi
sHi, Namit, Thanks a lot for your reply! I checked the source code. Given a query, (select tmp7.* from tmp7 join tmp2 on (tmp7.c2 = tmp2.c1)), there is only a MapReduce job generated. As far as I know, the function setInputFormat would be used to set the job's InputFormat class, in the ExecDr

Re: Using the same InputFormat class for JOIN?

2010-07-01 Thread Namit Jain
That's fine The 2 tables can have different inputformats Sent from my iPhone On Jul 1, 2010, at 9:51 AM, "yan qi" wrote: > Hi, > > I have a question about the JOIN operation in Hive. > > For example, I have a query, like > >select tmp7.* from tmp7 join tmp2 on (tmp7.c2 = tmp2.c1); > >

Using the same InputFormat class for JOIN?

2010-07-01 Thread yan qi
Hi, I have a question about the JOIN operation in Hive. For example, I have a query, like select tmp7.* from tmp7 join tmp2 on (tmp7.c2 = tmp2.c1); Clearly, there is a JOIN involved in the statement. 1. tmp2 and tmp7 are two tables. 2. c2 and c1 are columns belonging to tmp7 and