You are right Chuck. I thought his question was how to use zip files or any
compressed files in Hive tables.
Yeah, seems like you can't do that see:
http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=gb9yvasr2jl0u3yul2tfgu0...@mail.gmail.com%3E
But you can
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just
point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT
DELIMITED FIELDS TERMINATED BY STORED AS TEXTFILE
LOCATION '/home/manish/zipfile';
You can always use : ADD JAR to your HQL file or run
this command on Hive shell.
OR
I found this in a previous thread
Add following property to your hive-site.xml
hive.aux.jars.path
file:///home/me/my.jar,file:///home/you/your.jar,file:///home/us/our.jar
Hope this helps.
Richin
From: ext C
Hi Ravi,
The idea of using EMR is that you don't have to have a Hadoop cluster running
all the time. So put all your data in S3, spin up an EMR cluster, do
computation and store your data back in S3.
In an ideal case data in S3 should not be moved around and Hive will always
read from S3 if you
Hi Ravi,
Another way of doing apart from dynamic partition is if you can create your
directories like below either manually or the ETL process you might be doing to
get the table data it is pretty easy.
s3://ravi/logs/adv_id=123/date=2012-01-01/log.gz
s3://ravi/logs/adv_id=456/date=2012-01-02/
Thanks Jan,
I was looking for the first one, summing the values from two columns into one
number.
I did it as sum(col1) + sum(col2), but your solution is more elegant ☺
Regards,
Richin
From: ext Jan Dolinár [mailto:dolik@gmail.com]
Sent: Thursday, August 16, 2012 12:07 PM
To: user@hive.apac
You could do it using Pivot table in MS Excel. It's under the Insert tab, first
option on the left.
Richin
-Original Message-
From: Jain Richin (Nokia-LC/Boston)
Sent: Thursday, August 09, 2012 4:16 PM
To: user@hive.apache.org
Subject: RE: Converting rows into dynamic colums in Hive
Th
Hello,
Is there a way to aggregate multiple columns in Hive?
I can do it in two separate queries but is there something similar to
sum(col1,col2)?
Thanks,
Richin
Thanks Guys, it worked.
From: ext Bertrand Dechoux [mailto:decho...@gmail.com]
Sent: Thursday, August 09, 2012 5:03 PM
To: user@hive.apache.org
Subject: Re: Nested Select Statements
Basically a cross join. You would have the same issue with SQL.
Bertrand
On Thu, Aug 9, 2012 at 10:41 PM, shrikant
Hi (vers),
This might be a very basic question for most of you but I am stuck at it for
quite some time now. I have a table with three columns :
Describe usage;
ts string
id string
metric double
I am trying to do a query like
Select ts,id,sum(metric/(select count(*) from usage)) from usage group
Thanks John.
Is there a way to do this is excel since I am outputting the table data in csv
format (using macros or something else)?
Richin
-Original Message-
From: ext John Meagher [mailto:john.meag...@gmail.com]
Sent: Thursday, August 09, 2012 10:11 AM
To: user@hive.apache.org
Subjec
John,
What is R?
-Original Message-
From: ext John Meagher [mailto:john.meag...@gmail.com]
Sent: Wednesday, August 08, 2012 4:34 PM
To: user@hive.apache.org
Subject: Re: Converting rows into dynamic colums in Hive
I don't think having dynamic columns is possible in Hive. I've always ou
Thanks Ashish, that gives an idea.
But I am not sure about the outer select loop, I have to know all the values in
Beta column beforehand to do a max on each value.
Is there a better way?
Richin
From: ext Ashish Thusoo [mailto:athu...@qubole.com]
Sent: Tuesday, August 07, 2012 5:05 PM
To: user@h
Hi All,
One of my Query output looks like-
AlphaBeta Gamma
123 xyz 1.0
123 abc 0.5
123 pqr 1.3
123
Thanks Guys, I am changing my partition to hold a day worth of data and should
be good enough for Hive to operate on.
Thanks,
Richin
From: ext Bejoy Ks [mailto:bejoy...@yahoo.com]
Sent: Friday, July 27, 2012 3:06 PM
To: user@hive.apache.org
Subject: Re: Performance Issues in Hive with S3 and Par
Igor,
I did not see any major improvement in the performance even after setting
"Hive.optimize.s3.query=true", although the same was suggested by AWS Team.
My problem is I have too many small files - 3 level of partition, 6500+ files
and a single file is < 1 MB.
Now I know Hadoop and HDFS are n
Hi Igor,
Thanks for the response. Yes I am using EMR.
I will make changes and let you know if that helps.
Richin
From: ext Igor Tatarinov [mailto:i...@decide.com]
Sent: Tuesday, July 24, 2012 12:38 AM
To: user@hive.apache.org
Subject: Re: Performance Issues in Hive with S3 and Partitions
Are yo
Hi,
Sorry this is an AWS Hive Specific question. I have two External Hive tables
for my custom logs.
1. flat directory structure on AWS S3, no partition and files in bz2 compressed
format (few big files)
2. With 3 level of partitions on AWS S3 (lot of small uncompressed files)
I noticed that
Thanks Bejoy, that is really helpful.
From: ext Bejoy KS [mailto:bejoy...@yahoo.com]
Sent: Thursday, June 28, 2012 4:12 PM
To: Jain Richin (Nokia-HR/Boston); user@hive.apache.org
Subject: Re: Obvious and not so obvious query optimzations in Hive
Hi Richin
The Keys vary based on your queries on t
Bejoy, thanks again. This might be the silliest question but what are the keys
in a hive query. Is it the fields we pick in select clause or the one we define
with the group by clause.
Can you tell me what the keys will be for reducers for my query down below
CREATE EXTERNAL TABLE extlog
Thanks Nitin.
Depending on how I design my keys they might go to one or more reducers, but
shouldn't I be seeing empty files for the reducers which did not get any data
to reduce (because of the design of keys) ?
Or does hive clean all the empty files at the end of the query?
Richin
From: ext N
Igor,Bejoy - thanks a lot, that helps.
He, I am running the query on Amazon EMR cluster and based on the type of
instances I pick, default number of mappers and reducers are set. Now I would
expect Hive to generate that many number of output files as there are number of
reducers (since I am not
Hey Hivers,
I am trying to understand what are some of the obvious and not so obvious
optimization I could do for a Hive Query on AWS EMR cluster. I know the answer
for some of these questions but want to know what do you guys think and by what
factor it affects the performance over the other a
Hi Edward,
Sorry, If I was not clear. My question is around difference between
DoubleWritable in hadoop and hive, other writables from hadoop works fine in
hive.
Hive.serde types are limited to Double, Byte, Short and Timestamp.
I am using hive 0.8
Richin
-Original Message-
From: ext
Hi Guys,
I am writing a UDF in hive to convert a double value to string, so the evaluate
method of my UDF class looks like
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
//import org.apache.hadoop.io.DoubleWritable; - does not work
import org.apache.hadoop.hive.serd
25 matches
Mail list logo