RE: zip file or tar file cosumption

2012-09-26 Thread richin.jain
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables. Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=gb9yvasr2jl0u3yul2tfgu0...@mail.gmail.com%3E But you can

RE: zip file or tar file cosumption

2012-09-26 Thread richin.jain
Hi Manish, If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY STORED AS TEXTFILE LOCATION '/home/manish/zipfile';

RE: Hive custom inputfomat error.

2012-09-20 Thread richin.jain
You can always use : ADD JAR to your HQL file or run this command on Hive shell. OR I found this in a previous thread Add following property to your hive-site.xml hive.aux.jars.path file:///home/me/my.jar,file:///home/you/your.jar,file:///home/us/our.jar Hope this helps. Richin From: ext C

RE: Hive on EMR on S3 : Beginner

2012-08-27 Thread richin.jain
Hi Ravi, The idea of using EMR is that you don't have to have a Hadoop cluster running all the time. So put all your data in S3, spin up an EMR cluster, do computation and store your data back in S3. In an ideal case data in S3 should not be moved around and Hive will always read from S3 if you

RE: Hive on EMR on S3 : Beginner

2012-08-24 Thread richin.jain
Hi Ravi, Another way of doing apart from dynamic partition is if you can create your directories like below either manually or the ETL process you might be doing to get the table data it is pretty easy. s3://ravi/logs/adv_id=123/date=2012-01-01/log.gz s3://ravi/logs/adv_id=456/date=2012-01-02/

RE: Aggregate Multiple Columns

2012-08-16 Thread richin.jain
Thanks Jan, I was looking for the first one, summing the values from two columns into one number. I did it as sum(col1) + sum(col2), but your solution is more elegant ☺ Regards, Richin From: ext Jan Dolinár [mailto:dolik@gmail.com] Sent: Thursday, August 16, 2012 12:07 PM To: user@hive.apac

RE: Converting rows into dynamic colums in Hive

2012-08-16 Thread richin.jain
You could do it using Pivot table in MS Excel. It's under the Insert tab, first option on the left. Richin -Original Message- From: Jain Richin (Nokia-LC/Boston) Sent: Thursday, August 09, 2012 4:16 PM To: user@hive.apache.org Subject: RE: Converting rows into dynamic colums in Hive Th

Aggregate Multiple Columns

2012-08-16 Thread richin.jain
Hello, Is there a way to aggregate multiple columns in Hive? I can do it in two separate queries but is there something similar to sum(col1,col2)? Thanks, Richin

RE: Nested Select Statements

2012-08-09 Thread richin.jain
Thanks Guys, it worked. From: ext Bertrand Dechoux [mailto:decho...@gmail.com] Sent: Thursday, August 09, 2012 5:03 PM To: user@hive.apache.org Subject: Re: Nested Select Statements Basically a cross join. You would have the same issue with SQL. Bertrand On Thu, Aug 9, 2012 at 10:41 PM, shrikant

Nested Select Statements

2012-08-09 Thread richin.jain
Hi (vers), This might be a very basic question for most of you but I am stuck at it for quite some time now. I have a table with three columns : Describe usage; ts string id string metric double I am trying to do a query like Select ts,id,sum(metric/(select count(*) from usage)) from usage group

RE: Converting rows into dynamic colums in Hive

2012-08-09 Thread richin.jain
Thanks John. Is there a way to do this is excel since I am outputting the table data in csv format (using macros or something else)? Richin -Original Message- From: ext John Meagher [mailto:john.meag...@gmail.com] Sent: Thursday, August 09, 2012 10:11 AM To: user@hive.apache.org Subjec

RE: Converting rows into dynamic colums in Hive

2012-08-08 Thread richin.jain
John, What is R? -Original Message- From: ext John Meagher [mailto:john.meag...@gmail.com] Sent: Wednesday, August 08, 2012 4:34 PM To: user@hive.apache.org Subject: Re: Converting rows into dynamic colums in Hive I don't think having dynamic columns is possible in Hive. I've always ou

RE: Converting rows into dynamic colums in Hive

2012-08-08 Thread richin.jain
Thanks Ashish, that gives an idea. But I am not sure about the outer select loop, I have to know all the values in Beta column beforehand to do a max on each value. Is there a better way? Richin From: ext Ashish Thusoo [mailto:athu...@qubole.com] Sent: Tuesday, August 07, 2012 5:05 PM To: user@h

Converting rows into dynamic colums in Hive

2012-08-07 Thread richin.jain
Hi All, One of my Query output looks like- AlphaBeta Gamma 123 xyz 1.0 123 abc 0.5 123 pqr 1.3 123

RE: Performance Issues in Hive with S3 and Partitions

2012-07-27 Thread richin.jain
Thanks Guys, I am changing my partition to hold a day worth of data and should be good enough for Hive to operate on. Thanks, Richin From: ext Bejoy Ks [mailto:bejoy...@yahoo.com] Sent: Friday, July 27, 2012 3:06 PM To: user@hive.apache.org Subject: Re: Performance Issues in Hive with S3 and Par

RE: Performance Issues in Hive with S3 and Partitions

2012-07-27 Thread richin.jain
Igor, I did not see any major improvement in the performance even after setting "Hive.optimize.s3.query=true", although the same was suggested by AWS Team. My problem is I have too many small files - 3 level of partition, 6500+ files and a single file is < 1 MB. Now I know Hadoop and HDFS are n

RE: Performance Issues in Hive with S3 and Partitions

2012-07-24 Thread richin.jain
Hi Igor, Thanks for the response. Yes I am using EMR. I will make changes and let you know if that helps. Richin From: ext Igor Tatarinov [mailto:i...@decide.com] Sent: Tuesday, July 24, 2012 12:38 AM To: user@hive.apache.org Subject: Re: Performance Issues in Hive with S3 and Partitions Are yo

Performance Issues in Hive with S3 and Partitions

2012-07-23 Thread richin.jain
Hi, Sorry this is an AWS Hive Specific question. I have two External Hive tables for my custom logs. 1. flat directory structure on AWS S3, no partition and files in bz2 compressed format (few big files) 2. With 3 level of partitions on AWS S3 (lot of small uncompressed files) I noticed that

RE: Obvious and not so obvious query optimzations in Hive

2012-06-29 Thread richin.jain
Thanks Bejoy, that is really helpful. From: ext Bejoy KS [mailto:bejoy...@yahoo.com] Sent: Thursday, June 28, 2012 4:12 PM To: Jain Richin (Nokia-HR/Boston); user@hive.apache.org Subject: Re: Obvious and not so obvious query optimzations in Hive Hi Richin The Keys vary based on your queries on t

RE: Obvious and not so obvious query optimzations in Hive

2012-06-28 Thread richin.jain
Bejoy, thanks again. This might be the silliest question but what are the keys in a hive query. Is it the fields we pick in select clause or the one we define with the group by clause. Can you tell me what the keys will be for reducers for my query down below CREATE EXTERNAL TABLE extlog

RE: Obvious and not so obvious query optimzations in Hive

2012-06-28 Thread richin.jain
Thanks Nitin. Depending on how I design my keys they might go to one or more reducers, but shouldn't I be seeing empty files for the reducers which did not get any data to reduce (because of the design of keys) ? Or does hive clean all the empty files at the end of the query? Richin From: ext N

RE: Obvious and not so obvious query optimzations in Hive

2012-06-28 Thread richin.jain
Igor,Bejoy - thanks a lot, that helps. He, I am running the query on Amazon EMR cluster and based on the type of instances I pick, default number of mappers and reducers are set. Now I would expect Hive to generate that many number of output files as there are number of reducers (since I am not

Obvious and not so obvious query optimzations in Hive

2012-06-27 Thread richin.jain
Hey Hivers, I am trying to understand what are some of the obvious and not so obvious optimization I could do for a Hive Query on AWS EMR cluster. I know the answer for some of these questions but want to know what do you guys think and by what factor it affects the performance over the other a

RE: hadoop.io.DoubleWritable v/s hive.serde2.io.DoubleWritable

2012-06-12 Thread richin.jain
Hi Edward, Sorry, If I was not clear. My question is around difference between DoubleWritable in hadoop and hive, other writables from hadoop works fine in hive. Hive.serde types are limited to Double, Byte, Short and Timestamp. I am using hive 0.8 Richin -Original Message- From: ext

hadoop.io.DoubleWritable v/s hive.serde2.io.DoubleWritable

2012-06-12 Thread richin.jain
Hi Guys, I am writing a UDF in hive to convert a double value to string, so the evaluate method of my UDF class looks like import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; //import org.apache.hadoop.io.DoubleWritable; - does not work import org.apache.hadoop.hive.serd