RE: hive query doesn't seem to limit itself to partitions based on the WHERE clause

2010-10-04 Thread Namit Jain
What is your table hourly_fact partitioned on ? From: Marc Limotte [mslimo...@gmail.com] Sent: Friday, October 01, 2010 2:10 PM To: hive-user@hadoop.apache.org Subject: hive query doesn't seem to limit itself to partitions based on the WHERE clause Hi, >

RE: Load data from file header

2010-09-08 Thread Namit Jain
hen I insert into T2 from T1, I need to use the store# from T1 as the partition value. Is there a way to achieve this in version 0.4.0 of hive or do I need to install 0.6.0? Thanks for the help. -Original Message- From: Namit Jain [mailto:nj...@facebook.com] Sent: Saturday, September 04, 2010

RE: Load data from file header

2010-09-04 Thread Namit Jain
as textfile; Thanks, Sunil -Original Message- From: Namit Jain [mailto:nj...@facebook.com] Sent: Saturday, September 04, 2010 12:17 AM To: hive-user@hadoop.apache.org Subject: RE: Load data from file header create 2 tables T1 and T2. T1 has the schema of the file - no partitioning column (say

RE: Load data from file header

2010-09-03 Thread Namit Jain
create 2 tables T1 and T2. T1 has the schema of the file - no partitioning column (say c1,c2,store#) T2 is partitioned on (store#) - and the schema is 1 less column (c1, c2 partitioned by store#) load the data into T1 Then insert into T2 partition(store#) select c1,c2,store# from T1 ___

RE: How is Union All optimized in Hive

2010-08-26 Thread Namit Jain
jobs, but when I use 'explain' to see the syntax tree of the hql, I find 3 table scan in the tree, is table_1 really scanned only once? I am not quite familiar with the syntax tree. 2010/8/26 Namit Jain mailto:nj...@facebook.com>> Yes, it is optimized by hive. There will be only 1

RE: How is Union All optimized in Hive

2010-08-25 Thread Namit Jain
Yes, it is optimized by hive. There will be only 1 mr job, even if the columns selected were different. -namit From: Neil Xu [neil.x...@gmail.com] Sent: Wednesday, August 25, 2010 2:40 AM To: hive-user@hadoop.apache.org Subject: How is Union All optimize

RE: question - how to handle OR with HIVe

2010-08-20 Thread Namit Jain
Currently, only equality joins are supported. But, you can rewrite your query as: select substr(CB.EXEC_DATE,1,10), count(CB.ID) from callbacks CB JOIN (select * from pages p where p.page like '%google.com/search%'

RE: Dynamic Partition Inserts

2010-08-14 Thread Namit Jain
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; You need to set the above parameters. Thanks, -namit From: Namit Jain [nj...@facebook.com] Sent: Saturday, August 14, 2010 10:38 AM To: ; Subject: Re: Dynamic

Re: Dynamic Partition Inserts

2010-08-14 Thread Namit Jain
Project needs to be the last column In the select list Also, you need to set a variable to Enable dynamic prtn inserts Sent from my iPhone On Aug 14, 2010, at 9:46 AM, "Luke Crouch" wrote: > I'm trying to do a dynamic partition insert like so: > > FROM ( > FROM ( >

RE: Filter Operator appear twice?

2010-08-13 Thread Namit Jain
Currently, the hive optimizer tries to push the filter up, but probably does not remove the original filter. You can file a jira for that. Thanks, -namit From: Yingyi Bu [buyin...@gmail.com] Sent: Friday, August 13, 2010 7:52 AM To: hive-user@hadoop.apa

RE: How to merge small files

2010-08-09 Thread Namit Jain
if I two parameters both are true? 2010/8/9 Namit Jain mailto:nj...@facebook.com>> That's right From: lei liu [liulei...@gmail.com<mailto:liulei...@gmail.com>] Sent: Sunday, August 08, 2010 7:18 PM To: hive-user@hadoop.apache.org

RE: How to merge small files

2010-08-09 Thread Namit Jain
("set hive.merge.mapfiles=true"); statement.execute("set hive.merge.mapredfiles=true"); The two parementers are both true, right? 2010/8/6 Namit Jain mailto:nj...@facebook.com>> HIVEMERGEMAPFILES("hive.merge.mapfiles", true), HIVEMERGEMAPREDFILES("hi

RE: How to merge small files

2010-08-06 Thread Namit Jain
HIVEMERGEMAPFILES("hive.merge.mapfiles", true), HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false), Set the above parameters to true before your query. From: lei liu [liulei...@gmail.com] Sent: Thursday, August 05, 2010 8:47 PM To: hive-user@

RE: Shud partition order be same in create and insert?

2010-08-06 Thread Namit Jain
The order should be the same. Can you file a jira for this issue ? We should throw an error. Thanks, -namit From: Thiruvel Thirumoolan [thiru...@yahoo-inc.com] Sent: Friday, August 06, 2010 12:29 AM To: hive-user@hadoop.apache.org Subject: Shud partitio

Re: Using the same InputFormat class for JOIN?

2010-07-01 Thread Namit Jain
That's fine The 2 tables can have different inputformats Sent from my iPhone On Jul 1, 2010, at 9:51 AM, "yan qi" wrote: > Hi, > > I have a question about the JOIN operation in Hive. > > For example, I have a query, like > >select tmp7.* from tmp7 join tmp2 on (tmp7.c2 = tmp2.c1); > >

RE: Create Table with Line Terminated other than '\n'

2010-06-11 Thread Namit Jain
I dont think this will work. If the data is already in TextFormat, I think the SequenceFileRecordReader will be used to read the data which will break Do you already have the xml file from somewhere, or is also generated from hive From: Ashish Thusoo [

RE: set mapred.map.tasks=1 not work

2010-06-09 Thread Namit Jain
use CombineHiveInputFormat check your hive.input.format From: Alex Kozlov [ale...@cloudera.com] Sent: Wednesday, June 09, 2010 9:15 PM To: hive-user@hadoop.apache.org Subject: Re: set mapred.map.tasks=1 not work Hi Wd, Try: hive.merge.mapfiles=true hi

RE: Is hive.exec.scratchdir cleaned up ?

2010-05-11 Thread Namit Jain
The scratch dir has a format -mm-dd etc. You can write a script to delete old data (say older than 2 days) -Original Message- From: Ashwin Agate [mailto:aag...@mochila.com] Sent: Tuesday, May 11, 2010 1:31 PM To: hive-user@hadoop.apache.org Subject: Re: Is hive.exec.scratchdir clean

RE: mapjoin execution plan

2010-04-30 Thread Namit Jain
Right now, mapjoin is not fully optimized - it is the expected behavior. MapJoin writes the results in a temp file, and then the order by is processed of that file. Thanks, -namit -Original Message- From: Sarah Sproehnle [mailto:sa...@cloudera.com] Sent: Friday, April 30, 2010 5:25 PM

RE: Problem about 'like' condition

2010-04-12 Thread Namit Jain
I meant hive/hadoop conf. From: 朱鹤群 [mailto:zhuhe...@baixing.com] Sent: Monday, April 12, 2010 5:59 PM To: hive-user@hadoop.apache.org Subject: Re: Problem about 'like' condition Do you mean hadoop conf of hive DDL? 在 2010年4月13日 上午8:39,Namit Jain mailto:nj...@facebook.com>>

RE: Problem about 'like' condition

2010-04-12 Thread Namit Jain
Can you send your conf ? From: 朱鹤群 [mailto:zhuhe...@baixing.com] Sent: Monday, April 12, 2010 5:35 PM To: hive-user@hadoop.apache.org Subject: Re: Problem about 'like' condition Hi namit, No, I'm using TEXTFILE input format. Yes. My files are gzip files. Thanks, Gary 在 2010年4月1

RE: Problem about 'like' condition

2010-04-12 Thread Namit Jain
Are you using CombineHiveInputFormat ? And are you using compressed text files ? Thanks, -namit From: 朱鹤群 [mailto:zhuhe...@baixing.com] Sent: Monday, April 12, 2010 12:53 AM To: hive-user@hadoop.apache.org Subject: Problem about 'like' condition Hi, Has anybody got problem when query in hive

RE: Hive jobs only run with 1 map task

2010-02-23 Thread Namit Jain
single 30 gig tab separated file. It also happens on tables that are split over a hundreds files. On 23 February 2010 19:20, Namit Jain wrote: > What is the size of the input data for the query ? > > Since you are using CombineHiveInputFormat, multiple files can be read by a > s

RE: Hive jobs only run with 1 map task

2010-02-23 Thread Namit Jain
emory=0.5 > hive.exec.reducers.max=107 > hive.hwi.listen.host=0.0.0.0 > hive.exec.compress.intermediate=false > hive.optimize.cp=true > hive.optimize.ppd=true > hive.session.id=tims_201002231907 > hive.merge.mapredfiles=false > > ~Tim. > > On 23 February 2010 19:03, Namit

RE: Hive jobs only run with 1 map task

2010-02-23 Thread Namit Jain
Can you check your input format ? Can you check the value of the parameter : hive.input.format ? Can you send all the parameters ? Thanks, -namit -Original Message- From: Tim Sell [mailto:trs...@gmail.com] Sent: Tuesday, February 23, 2010 11:00 AM To: hive-user@hadoop.apache.org Su

RE: Create table as select

2010-02-12 Thread Namit Jain
Create table as select in supported in Hive 0.4 release. It is a new feature in 0.5 -Original Message- From: E. Sammer [mailto:e...@lifeless.net] Sent: Friday, February 12, 2010 4:41 PM To: hive-user@hadoop.apache.org Subject: Create table as select The Hive wiki states the following fo

Re: Best way to move Hive tables+data from one Hadoop cluster to another

2010-02-11 Thread Namit Jain
Is it a one time operation or continuous one ? If it is a one-time operation, the steps suggested below should work. Otherwise, you need to set up a process which will continuously feed the source and apply changes in the destination. Let me know, if this is the case - we have a similar requirem

RE: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

2010-02-02 Thread Namit Jain
hadoop On Mon, Feb 1, 2010 at 5:31 PM, Namit Jain wrote: > I will take a look - > > It will be great if you can file a jira and add a patch for that > > > > From: Roberto Congiu [mailto:roberto.con...@openx.org] > Sent: Monday, February 01, 2010 11:02 AM > To:

RE: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

2010-02-01 Thread Namit Jain
I will take a look - It will be great if you can file a jira and add a patch for that From: Roberto Congiu [mailto:roberto.con...@openx.org] Sent: Monday, February 01, 2010 11:02 AM To: Namit Jain Cc: hive-user@hadoop.apache.org Subject: Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

RE: hive multiple inserts

2010-01-21 Thread Namit Jain
Which version are you using ? The bug mentioned was fixed by: https://issues.apache.org/jira/browse/HIVE-1039 Thanks, -namit -Original Message- From: Min Zhou [mailto:coderp...@gmail.com] Sent: Thursday, January 21, 2010 12:40 AM To: hive-user@hadoop.apache.org Subject: Re: hive multip

Re: CombinedHiveInputFormat combining across tables

2009-12-21 Thread Namit Jain
Thanks David, It would be very useful if you can file jiras and patches for the same. Thanks, -namit On 12/21/09 6:58 PM, "David Lerman" wrote: Thanks Zheng. We're using trunk, r888452. We actually ended up making three changes to CombineHiveInputFormat.java to get it working in our environ

RE: LIKE operator

2009-12-15 Thread Namit Jain
.com] Sent: Tuesday, December 15, 2009 3:45 PM To: hive-user@hadoop.apache.org Subject: RE: LIKE operator Sorry table A has around 3,000,000 records not 200,000. Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | Cell: 718-930-7947 ____ Fr

RE: LIKE operator

2009-12-15 Thread Namit Jain
____ From: Namit Jain [mailto:nj...@facebook.com] Sent: Tuesday, December 15, 2009 3:27 PM To: hive-user@hadoop.apache.org Subject: RE: LIKE operator I should clarify that this is not the most efficient way of doing it - since we are doing a Cartesian product first (which will go to 1 reducer). We d

RE: LIKE operator

2009-12-15 Thread Namit Jain
the obvious CREATE TABLE AAA AS SELECT with not success. Thanks. Lee Sagi | Data Warehouse Tech Lead & Architect | Work: 650-616-6575 | Cell: 718-930-7947 From: Namit Jain [mailto:nj...@facebook.com] Sent: Tuesday, December 15, 2009 3:12 PM To: hive-

RE: LIKE operator

2009-12-15 Thread Namit Jain
Hive only supports equality joins right now: INSERT OVERWRITE TABLE ZZ SELECT VOTF_REQUEST_ID , THRESHOLD_VALUE , THRESHOLD_MET , BRAND_ID FROM A LEFT OUTER JOIN B ON (A.CLIENT_IP LIKE B.IP) WHERE date_key = '2009121315'; Can be rewritten as: INSERT OVERWRITE TABLE ZZ SELECT VOTF_REQUEST_ID , T

Re: How to define a cube and do drill-down operation on hive?

2009-11-13 Thread Namit Jain
It is not supported currently On 11/13/09 7:42 AM, "Clark Yang (杨卓荦)" wrote: As I know, hive is a data-warehouse infrastructure, does it support some OLAP operation such as drill-down or roll-up? 好玩贺卡等你发,邮箱贺卡全新上线!

Re: exception when query on a partitioned table w/o any data

2009-11-10 Thread Namit Jain
If you want to use nonstrict mode, the jira might be : http://issues.apache.org/jira/browse/HIVE-879 On 11/10/09 9:07 PM, "Min Zhou" wrote: Hi, Ning, yep,we are using 0.4.0. Can you show me the jira number? Thanks, Min On Wed, Nov 11, 2009 at 12:44 PM, Ning Zhang wrote: > Min, > > It worke

Re: exception when query on a partitioned table w/o any data

2009-11-10 Thread Namit Jain
I think you should change hive.mapred.mode to strict, the default value On 11/10/09 9:14 PM, "Min Zhou" wrote: strict

Re: exception when query on a partitioned table w/o any data

2009-11-10 Thread Namit Jain
Looks like some configuration problem - this query should have failed at compile time, since it is not referencing any valid partition. Can you send your hive-site.xml ? On 11/10/09 8:01 PM, "Min Zhou" wrote: Hi, guys, hive> create table pokes(foo string, bar int) partitioned by (pt string)

Re: Self join problem

2009-11-10 Thread Namit Jain
uspecting that it might be an issue with the reduce job itself - is there a way to figure out what these jobs are doing exactly? Thanks. On Tue, Nov 3, 2009 at 6:53 PM, Namit Jain < <mailto:nj...@facebook.com><mailto:nj...@facebook.com> <mailto:nj...@facebook.com><mailto:

RE: Self join problem

2009-11-03 Thread Namit Jain
x27;t spawn more reduce jobs for this query? On Tue, Nov 3, 2009 at 4:47 PM, Namit Jain mailto:nj...@facebook.com>> wrote: Get the join condition in the on condition: insert overwrite table foo1 select m1.id<http://m1.id> as id_1, m2.id<http://m2.id> as id_2, count(1), m1.dt f

RE: Self join problem

2009-11-03 Thread Namit Jain
Get the join condition in the on condition: insert overwrite table foo1 select m1.id as id_1, m2.id as id_2, count(1), m1.dt from m1 join m2 on m1.dt=m2.dt where m1.id <> m2.id and m1.id < m2.id group by m1.id

Re: join in hive

2009-10-25 Thread Namit Jain
I wanted to add one more thing, Ning has started working on semi-join. So, please co-ordinate with him on that. http://issues.apache.org/jira/browse/HIVE-870 On 10/25/09 9:04 PM, "Zheng Shao" wrote: Mostly correct. 2. Your idea looks interesting but I would say in reality, the percentage

Re: hive problem

2009-10-22 Thread Namit Jain
Which version of hive and hadoop are you using ? On 10/22/09 3:00 PM, "Gang Luo" wrote: Hi everyone, I am new to Hive. Here is a problem. I followed the getting started instructions to setup my hive (as well as hadoop). I can create tables, show tables and select without any where clause.

RE: Hive-74

2009-10-19 Thread Namit Jain
ADOOP-5759 I'm guessing I need to build hadoop from 20 branch for this to work ? Thanks -Matt On Thu, Oct 8, 2009 at 1:26 AM, Namit Jain mailto:nj...@facebook.com>> wrote: Hi Matt, Sorry for the late reply. hive> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInput

Re: hive query question

2009-10-15 Thread Namit Jain
1. Create table as select was added in trunk recently - it is not present in 0.4. 2. Hive does mandate that the schema of all the subqueries of union all is same. Thanks, -namit On 10/15/09 6:45 AM, "Edward Capriolo" wrote: Arijit, You are obviously a query guru. I think I can answer your qu

Re: Permanent UDF's

2009-10-12 Thread Namit Jain
There are a bunch of jiras on this - but it is not supported currently On 10/12/09 4:28 PM, "Bobby Rullo" wrote: Hey there, Is there a way to permanently register your UDF's so that you don't have to do a "create temporary function ..." at beginning of each session? Another alternative woul

Re: Hive-74

2009-10-07 Thread Namit Jain
8 with > errors > 2009-10-01 10:40:58,622 ERROR ql.Driver (SessionState.java:printError(248)) > - FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.ExecDriver > > > > On Wed, Sep 30, 2009 at 5:26 PM, Namit Jain wrote: > >> What you are doing

RE: INSERT OVERWRITE restriction

2009-10-06 Thread Namit Jain
Rows loaded into t1) were way off. Doing a count on the file afterwards yielded correct results though. I wonder why? Thanks, Vinay ________ From: Namit Jain To: "hive-user@hadoop.apache.org" Sent: Tuesday, October 6, 2009 3:10:32 PM Subject: RE: INSERT OVERWR

RE: INSERT OVERWRITE restriction

2009-10-06 Thread Namit Jain
Currently, the only way to do this is to select the existing data and do a union: INSERT INTO TABLE t1 partition(p1='x', p2='y') select a, b from t2 union all select * from t1 where p1='x' and p2='y' From: vinay gupta [mailto:vingup2...@yahoo.com] Sent: Tuesday, October 06, 2009 2:43 PM To: h

RE: input20.q unit test failure

2009-10-02 Thread Namit Jain
Check build/ql/tmp/hive.log - it may contain some clues From: Bill Graham [mailto:billgra...@gmail.com] Sent: Friday, October 02, 2009 10:49 AM To: Namit Jain Cc: hive-user@hadoop.apache.org Subject: Re: input20.q unit test failure Thanks Namit for giving it a shot. I just updated again and

RE: input20.q unit test failure

2009-10-02 Thread Namit Jain
I tried the test on the latest version and it ran fine for me. [nj...@dev029 hive4]$ svn info Path: . URL: http://svn.apache.org/repos/asf/hadoop/hive/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 821103 Node Kind: directory

RE: Hive-74

2009-09-30 Thread Namit Jain
What you are doing seems OK ? Can you get the stack trace from /tmp//hive.log ? -Original Message- From: Matt Pestritto [mailto:m...@pestritto.com] Sent: Wednesday, September 30, 2009 6:51 AM To: hive-...@hadoop.apache.org; hive-user@hadoop.apache.org Subject: Fwd: Hive-74 Including

Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

2009-09-30 Thread Namit Jain
e, shouldn't be difficult since it would be basically mimicking CombineFileInputFormat. Once I add the appropriate logic to the shim, I have to set hive.input.format to org.apache.hadoop.hive.ql.io.CombineHiveInputFormat to have hive actually use it, right ? Roberto 2009/9/29 Namit Jain :

Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

2009-09-29 Thread Namit Jain
to hadoop-0.20 ? That might be simpler. If not, you can definitely implement the shim using MultiFileInputFormat for 0.19 (which should work even with 0.17). Do you need some help in understanding the current shim code ? Thanks, -namit On 9/29/09 10:53 AM, "Namit Jain" wrote: Ju

RE: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

2009-09-29 Thread Namit Jain
Just checked - CombineFileInputFormat and a lot of other related stuff went to hadoop 0.20 So, it would be very difficult to add this for 0.19 From: Namit Jain [mailto:nj...@facebook.com] Sent: Monday, September 28, 2009 10:30 PM To: hive-user@hadoop.apache.org; roberto.con...@openx.org Subject

Re: HIVE-74 and CombineFileInputFormat on pre-0.20 hadoop

2009-09-28 Thread Namit Jain
I am not sure whether CombineFileInputFormat (in hadoop) is available in 0.19 - If it is, we can add it, otherwise it will be very difficult. On 9/28/09 7:06 PM, "Raghu Murthy" wrote: Can we add MultiFileInputFormat as the CombineFileInputFormatShim for hadoop-0.19? On 9/28/09 6:57 PM, "Rober

RE: adding filenames as new columns via Hive

2009-09-16 Thread Namit Jain
I don't think it is a good idea to make it a part of table metadata in any way. What happens if the filename changes ? It will be very difficult to maintain. But, we can definitely add some virtual columns (FILENAME can be one of them to start with - it should not show up in describe, select * etc.

RE: hive queries over tables in different formats

2009-09-16 Thread Namit Jain
Yes, that should be fine From: Abhijit Pol [mailto:a...@rocketfuelinc.com] Sent: Wednesday, September 16, 2009 11:04 AM To: hive-user@hadoop.apache.org Subject: hive queries over tables in different formats does hive support queries over multiple tables stored in different formats? for example

vote for release candidate for hive

2009-09-15 Thread Namit Jain
I have created another release candidate for Hive. https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/ Let me know if it is OK to publish this release candidate. The only change from the previous candidate (https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.

RE: Strange behavior during Hive queries

2009-09-14 Thread Namit Jain
Currently, hive uses 1 mapper per file - does your table have lots of small files ? If yes, it might be a good idea to concatenate them into fewer files From: Ravi Jagannathan [mailto:ravi.jagannat...@nominum.com] Sent: Monday, September 14, 2009 12:17 PM To: Brad Heintz; hive-user@hadoop.apache

RE: Directing Hive to perform Hash Join for small inner tables

2009-09-11 Thread Namit Jain
the data? PhD Candidate CS @ UCSB Santa Barbara, CA 93106, USA http://www.cs.ucsb.edu/~sudipto On Fri, Sep 11, 2009 at 1:29 PM, Namit Jain mailto:nj...@facebook.com>> wrote: Yes, you can specify the list of tables in the hint MAPJOIN(x,y,z) From: Sudipto Das [mailto:

RE: Directing Hive to perform Hash Join for small inner tables

2009-09-11 Thread Namit Jain
aggregation on the JOINed result. Thanks Sudipto PhD Candidate CS @ UCSB Santa Barbara, CA 93106, USA http://www.cs.ucsb.edu/~sudipto On Fri, Sep 11, 2009 at 12:12 PM, Namit Jain mailto:nj...@facebook.com>> wrote: This is some problem in trunk - it runs fine in 0.4 - I will take a look

RE: Directing Hive to perform Hash Join for small inner tables

2009-09-11 Thread Namit Jain
te CS @ UCSB Santa Barbara, CA 93106, USA http://www.cs.ucsb.edu/~sudipto On Thu, Sep 10, 2009 at 6:05 PM, Namit Jain mailto:nj...@facebook.com>> wrote: The data you have sent has 4 columns - insert overwrite table join_result select /*+ MAPJOIN(m)*/ m.mid, m.param, r.rating from d

RE: Directing Hive to perform Hash Join for small inner tables

2009-09-10 Thread Namit Jain
to point to directory in HDFS that contains this file). I hope you can see the same error. I am using Hadoop 0.18.3, and Hive trunk r812721. Thanks Sudipto PhD Candidate CS @ UCSB Santa Barbara, CA 93106, USA http://www.cs.ucsb.edu/~sudipto On Wed, Sep 9, 2009 at 7:12 PM, Namit Jain mailto:nj

RE: enforcing query with partition column

2009-09-10 Thread Namit Jain
. However following query is now not allowed: SELECT a.bar FROM ( SELECT b.bar FROM table1 b WHERE b.partition_key = 1 UNION ALL SELECT c.bar FROM table2 c WHERE c.partition_key = 1 ) a Individual sub queries are allowed here...any suggestions? On Tue, Sep 8, 2009 at 10:22 AM, Namit Jain mailto:nj

RE: Directing Hive to perform Hash Join for small inner tables

2009-09-09 Thread Namit Jain
ay of creating a test data set. I was just taking the first 100 - 1000 rows.. I am attaching all the queries and the explain. Let me know if it helps: Thanks Sudipto PhD Candidate CS @ UCSB Santa Barbara, CA 93106, USA http://www.cs.ucsb.edu/~sudipto On Wed, Sep 9, 2009 at 6:13 PM, Namit

RE: Directing Hive to perform Hash Join for small inner tables

2009-09-09 Thread Namit Jain
, USA http://www.cs.ucsb.edu/~sudipto On Wed, Sep 9, 2009 at 2:03 PM, Namit Jain mailto:nj...@facebook.com>> wrote: You can specify it as a hint in the select list: select /*+ MAPJOIN(b) */ ... from T a JOIN T2 b on ... In the example above, T2 is the small table which can be

RE: Directing Hive to perform Hash Join for small inner tables

2009-09-09 Thread Namit Jain
You can specify it as a hint in the select list: select /*+ MAPJOIN(b) */ ... from T a JOIN T2 b on ... In the example above, T2 is the small table which can be cached in memory From: sudipt...@gmail.com [mailto:sudipt...@gmail.com] On Behalf Of Sudipto Das Sent: Wednesday, September 09,

RE: enforcing query with partition column

2009-09-08 Thread Namit Jain
If you run in strict mode, that is enforced. set hive.mapred.mode=strict -namit -Original Message- From: Abhijit Pol [mailto:a...@rocketfuelinc.com] Sent: Tuesday, September 08, 2009 10:07 AM To: hive-user@hadoop.apache.org Subject: enforcing query with partition column Is there a w

Re: INSERT OVERWRITE/APPEND TABLE

2009-08-08 Thread Namit Jain
There is no support for INSERT APPEND TABLE right now. The workaround is to select the same table and a union all For eg: If you want to insert append into table T Insert append table T select xxx Is same as: Insert overwrite table T Select xxx Union all Select * from T -namit On 8/8

RE: partitions not being created

2009-07-28 Thread Namit Jain
There are no partitions in the table - Can you post the output you get while loading the data ? From: Bill Graham [mailto:billgra...@gmail.com] Sent: Tuesday, July 28, 2009 5:54 PM To: hive-user@hadoop.apache.org Subject: partitions not being created Hi, I'm trying to create a partitioned table

RE: Select distinct and partitions

2009-07-23 Thread Namit Jain
I agree that the last query can be optimized but currently it is not. The workaround is to use a different API show partitions tmp_pageviews; Thanks, -namit -Original Message- From: Neal Richter [mailto:nrich...@gmail.com] Sent: Thursday, July 23, 2009 8:54 AM To: hive-user Subject:

RE: insert into not supported?

2009-07-21 Thread Namit Jain
Currently, hive only supports overwriting the data - appending is not supported From: chen keven [mailto:cke...@gmail.com] Sent: Tuesday, July 21, 2009 2:34 PM To: hive-user@hadoop.apache.org Subject: insert into not supported? I tried to use" insert into " command to insert data in table. Howe

RE: NPEs when using map-side join

2009-07-17 Thread Namit Jain
This should be fixed as part of https://issues.apache.org/jira/browse/HIVE-405 Can you try in the trunk version and let us know if you still see any problems ? From: Jason Michael [mailto:jmich...@videoegg.com] Sent: Tuesday, July 14, 2009 3:01 PM To: hive mailing list Subject: NPEs when usi

Re: Selecting data based on the clustered columns

2009-07-16 Thread Namit Jain
hu, Jul 16, 2009 at 7:49 PM, Namit Jain wrote: Right now, bucketing information is not used in a lot of places - it is only used in sampling. For eg: If your query was: Select .. From Posts(tablesample 1 out of 256) a; Then only the first bucket will be scanned. Your query can be optimized, but

Re: Selecting data based on the clustered columns

2009-07-16 Thread Namit Jain
Right now, bucketing information is not used in a lot of places - it is only used in sampling. For eg: If your query was: Select .. From Posts(tablesample 1 out of 256) a; Then only the first bucket will be scanned. Your query can be optimized, but currently it is not. Can you file a jira on

Re: Hive ignoring my setting for mapred.reduce.tasks

2009-07-14 Thread Namit Jain
For a query with no grouping. hive always uses 1 reducer. Since map-side aggregation will reduce the number of rows, it is not a problem On 7/14/09 10:11 PM, "Saurabh Nanda" wrote: Hi, I have a table with 15,000,000+ rows sitting on a 4-node hadoop cluster with dfs.replication=4. Hive seems

Re: Join optimization for star schemas

2009-07-14 Thread Namit Jain
dimension_table .factkey > Can not sure this can reduce the execution time, and may increase the > execution time (you can do some experiments :) ). > > > On 09-7-14 下午1:45, "Namit Jain" wrote: > >> The large table is only a problem if the number of values for a give

Re: Join optimization for star schemas

2009-07-13 Thread Namit Jain
The large table is only a problem if the number of values for a given key are very large - they are stored in memory. If you dimension tables are small, you can use map-join. That way, no reduction is needed. You need to specify the hint in the select clause and the list of tables which are smal

Re: Query when partition presents.

2009-07-09 Thread Namit Jain
Merged that just now On 7/9/09 7:57 AM, "He Yongqiang" wrote: Hi Namit, I found the jira issue https://issues.apache.org/jira/browse/HIVE-527. On 09-7-9 下午10:17, "Namit Jain" wrote: If a table is partitioned, we should only allow insertion/loading data in a parti

Re: Query when partition presents.

2009-07-09 Thread Namit Jain
If a table is partitioned, we should only allow insertion/loading data in a partition - not the table. So, load data local inpath \"/Users/char/Documents/workspace/Hive-Clean/data/files/kv1.txt\" into table srcpart ; should fail. I think there was already a jira for that - let me check and othe

RE: Combined data more throughput

2009-07-08 Thread Namit Jain
The reason that we need a lot of mappers is because: 1) The input data size is very large. 2) The number of input files is very large. In order to decrease the number of mappers, set mapred.min.split.size to a big number (like 10 (1GB)). The default value is 128MB. If you increase

RE: distinct with union all

2009-07-08 Thread Namit Jain
[studenttab, studentcolon, studentusrdef, votertab, reg1459894, textdoc, unicode] Died at generate_data.pl line 145. [nj...@dev029 ~/pigmix]$ From: Rakesh Setty [mailto:serak...@yahoo-inc.com] Sent: Tuesday, July 07, 2009 1:23 PM To: Namit Jain Subject: RE: distinct with union all

RE: Regarding Hive

2009-07-07 Thread Namit Jain
Haritha, Please post your questions on hive-user@hadoop.apache.org<mailto:hive-user@hadoop.apache.org> Thanks, -namit From: Haritha Javvadi [mailto:haritha.javv...@gmail.com] Sent: Tuesday, July 07, 2009 4:29 PM To: Namit Jain Subject: Regarding Hive Hello, This is haritha

RE: distinct with union all

2009-07-07 Thread Namit Jain
Rakesh, Did the queries work for you ? Thanks, -namit From: Namit Jain [mailto:nj...@facebook.com] Sent: Monday, July 06, 2009 2:22 PM To: hive-user@hadoop.apache.org Subject: RE: distinct with union all Similar queries seem to be working for me. 1. Which version of hive are using ? 2

RE: Combined data more throughput

2009-07-06 Thread Namit Jain
- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Monday, July 06, 2009 1:05 PM To: Namit Jain Subject: Combined data more throughput Namit, Sorry, I could not reply to this on list. I took the latest trunk and now my query is failing. Let me know what you think. Table 1: CREATE TABLE ra

RE: Combine data for more throughput

2009-07-06 Thread Namit Jain
so tried making the second table to not be sequence file. I run into the same problem. Any other hints? using /bin/cat is what you meant by identity reducer? Should I just use hadoop and do an identity mapper and identity reduce for this problem? Thank you, Edward On Mon, Jul 6, 2009 at 2:01 P

RE: distinct with union all

2009-07-06 Thread Namit Jain
: RE: distinct with union all Following are the counts select count(distinct user) from Tb1; 976272 select count(1) from Tb1; 1642777 select count(distinct user) from Tb1Debug; 653824 select count(1) from Tb1Debug; 1095933 Thanks, Rakesh From: Namit Jain

RE: distinct with union all

2009-07-06 Thread Namit Jain
om the query select count(distinct user) from Tb1Debug; @Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine. Please let me know what could be the problem. Thanks, Rakesh From: Namit Jain [mailto:nj...@facebook.com

RE: Combine data for more throughput

2009-07-06 Thread Namit Jain
hive.merge.mapfiles is set to true by default. So, in trunk, you should get small output files. Can you do a explain plan and send it if that is not the case ? -Original Message- From: Ashish Thusoo [mailto:athu...@facebook.com] Sent: Monday, July 06, 2009 10:50 AM To: hive-user@hadoop.ap

RE: distinct with union all

2009-07-06 Thread Namit Jain
I am getting duplicate usernames. From: Namit Jain [mailto:nj...@facebook.com] Sent: Thursday, July 02, 2009 1:25 PM To: hive-user@hadoop.apache.org<mailto:hive-user@hadoop.apache.org> Subject: RE: distinct with union all Are you getting duplicate usernam

RE: distinct with union all

2009-07-02 Thread Namit Jain
ecial characters) for the usernames being extracted. also to debug, try to do a select username, count(1) then group by username -- amr Rakesh Setty wrote: Yes, I am getting duplicate usernames. From: Namit Jain [mailto:nj...@facebook.com] Sent: Thursday, July 02, 2

RE: distinct with union all

2009-07-02 Thread Namit Jain
@hadoop.apache.org Subject: RE: distinct with union all Yes, I am getting duplicate usernames. From: Namit Jain [mailto:nj...@facebook.com] Sent: Thursday, July 02, 2009 1:25 PM To: hive-user@hadoop.apache.org Subject: RE: distinct with union all Are you getting duplicate

RE: distinct with union all

2009-07-02 Thread Namit Jain
Are you getting duplicate usernames ? From: Rakesh Setty [mailto:serak...@yahoo-inc.com] Sent: Thursday, July 02, 2009 12:37 PM To: hive-user@hadoop.apache.org Subject: distinct with union all Hi, I have a query like select distinct username from (select user as username from page_views pv uni

RE: Set difference in Hive

2009-06-29 Thread Namit Jain
Hive I tried this. Unfortunately, both tables are large. Thanks, Rakesh From: Namit Jain [mailto:nj...@facebook.com] Sent: Monday, June 29, 2009 5:05 PM To: hive-user@hadoop.apache.org Subject: RE: Set difference in Hive The tables can be large - For a given

RE: Set difference in Hive

2009-06-29 Thread Namit Jain
The tables can be large - For a given key, have the table with the most number of values as the rightmost table. The problem only happens when both the tables have keys with large number of values. Thanks, -namit From: Rakesh Setty [mailto:serak...@yahoo-inc.com] Sent: Monday, June 29, 20

RE: getting the field types of a query result

2009-06-29 Thread Namit Jain
Had a discussion with Raghu/Zheng offline. Basically, genFileSinkPlan is not passing the type information - that will be added, and then LazySerDe will support serializing into json even if the types are provided From: Ashish Thusoo [mailto:athu...@facebook.com] Sent: Monday, June 29, 2009 11:0

RE: OutOfMemory when doing map-side join

2009-06-15 Thread Namit Jain
Set mapred.child.java.opts to increase mapper memory. From: Namit Jain [mailto:nj...@facebook.com] Sent: Monday, June 15, 2009 3:53 PM To: hive-user@hadoop.apache.org Subject: RE: OutOfMemory when doing map-side join There are multiple things going on. Column pruning is not working with map

RE: OutOfMemory when doing map-side join

2009-06-15 Thread Namit Jain
nner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) On Mon, Jun 15, 2009 at 1:52 PM, Namit Jain mailto:nj...@facebook.com>> wrote: The problem seems to be in partition pruning - The small table 'application' is partitioned - and probably, there are 20k rows in th

  1   2   >