Hi Yogesh
Which ever character is the column separator in your input file , you need to
provide that in
FIELDS TERMINATED BY clause.
Also the common storage formats supported by hive includes
- Text File
- Sequence File
- RC File etc
Regards
Bejoy KS
Sent from handheld, please excuse typos.
@Kulkarni,
As suggested copied the JAR into the specified directory, now it is
complaining as -
/java.io.FileNotFoundException: File does not exist:
/tmp/hduser/hive_2012-09-27_11-24-32_966_3932937489091919630/-mr-1/1/emptyFile/
where hduser is the unix login ID from where i'm executing the
Thanks Ashok :-),
I am not so much aware about regarding storage formats in hive(I am new to
hive).
Could you please list some of them except those.
1) space seprated --> FIELDS TERMINATED BY " ";
2) Control-A seprated > FIELDS TERMINATED BY '\001'
3) Tab septared -->
Hi,
For 1), did you follow the wiki?
https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode
Maybe you didn't provide the right jars.
Did you check that you could connect yourself to the database?
For 2), I don't know which version is supported for Hadoop.
https://cwiki.apache.org/c
'\t'
From: yogesh dhari [yogeshdh...@live.com]
Sent: Thursday, September 27, 2012 10:42 AM
To: hive request
Subject: how to load TAB_SEPRATED file in hive table
Hi all,
I have a file in which records are Tab-Seprated,
Please suggest me how to upload such f
Hi all,
I have a file in which records are Tab-Seprated,
Please suggest me how to upload such file in Hive table.
Like how to specify
Create table XYZ (name STRING, roll INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY
Please suggest for "" over here.
Thanks & Regards
Yogesh Kumar
but it's map only job
At 2012-09-27 05:39:39,"Chen Song" wrote:
As far as I know, the number of files emitted would be determined by the number
of mappers for a map only job and the number of reducers for a map reduce job.
So it totally depends how your query translates into a MR job.
You ca
Manish the table that has been created for zipped text files should be defined
as sequence file, for example
CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' stored as sequencefile;
After this you can use regular load command to load these files,
Hi,
Can anybody help me in following issues.
1) I am using hadoop-1.0.3 with hive-0.9.0
When start hive in embedded derby mode it is working fine. But when I start
in external derby mode I am getting following error. What could be the
issue?
hive> show tables;
FAILED: Error in metadata: javax.jdo
As far as I know, the number of files emitted would be determined by the
number of mappers for a map only job and the number of reducers for a map
reduce job.
So it totally depends how your query translates into a MR job.
You can enforce it by setting the property
*mapred.reduce.tasks=1*
Chen
With my limited knowledge of hive, I don't think it is possible to get the
actual value of the argument and I don't think it is or should be designed
to provide that information either. *initialize* is intended only for
decoding the meta structure (type and its associated evaluation mechanism)
of a
I understand your message. But in this situation, I want to do the following:
1) I want to get the value 10 in the initialization stage. I understand your
point that the value will only available in the evaluate stage, but keep in
mind that for this 10 in my example, it is a constants value. It
Hi ashok,
Thank you very much.
Regards
Abhi
Sent from my iPhone
On Sep 26, 2012, at 1:32 PM, wrote:
> Hello Abhi,
> Hope below information ll help you.
> mapred.reduce.tasks
> Default Value: -1
> Added In: 0.1
> The default number of reduce tasks per job. Typically set to a prime close to
>
Thanks bejoy.
Regards
Abhi
Sent from my iPhone
On Sep 26, 2012, at 1:42 PM, Bejoy KS wrote:
> Hi Abshiek
>
> From the map reduce logs you can see whether the data processed by one
> reducer is much more than that of other reducers. Or in short one reducer
> takes relatively longer time comp
Hi Abshiek
From the map reduce logs you can see whether the data processed by one reducer
is much more than that of other reducers. Or in short one reducer takes
relatively longer time complete compared to others.
Also to my previous mail, one more optimization is possible for group By if
your
Hi Yong
The way GenericUDF works is as follows.
*ObjectInspector initialize(ObjectInspector[] arguments) *is called only
once for one GenericUDF instance used in your Hive query. This phase is for
preparation steps of UDF, such as syntax check and type inference.
*Object evaluate(DeferredObject[
Hello Abhi,
Hope below information ll help you.
mapred.reduce.tasks
* Default Value: -1
* Added In: 0.1
The default number of reduce tasks per job. Typically set to a prime close to
the number of available hosts. Ignored when mapred.job.tracker is "local".
Hadoop set this to 1 by default
Hi Bejoy,
Thanks for the reply, how can I know data skew among reducers.
Regards
Abhi
Sent from my iPhone
On Sep 26, 2012, at 1:20 PM, Bejoy KS wrote:
> Hi Abshiek
>
> Group by performance can be improved by the following
> 1)enabling map side aggregation. In latest versions it is enabled by
Hi Abshiek
Group by performance can be improved by the following
1)enabling map side aggregation. In latest versions it is enabled by default
SET hive.map.aggr = true;
2)Is there a data skew observed in some of the reducers?
If so a better performance can be yielded by setting the following prop
Hi all,
I have written a query with group by clause, it is consuming lot of time is
there any way to optimize this any configuration property or some thing.
Regards
Abhi
Sent from my iPhone
Hi Ashok,
Thanks for the reply, can you please tell me how many reducers should be
considered using for 1 GB of intermediate data.
Regards
Abhi
Sent from my iPhone
On Sep 26, 2012, at 12:39 PM, wrote:
> Yes Abshiek,
> By setting below prop. You ll get better result. The no should depends on
Hi Chalcy,
You can go for OAUTH for security purpose if you want to provide specific
authorization flows for web applications with hive.
Even Google relay on this. But coming to pentaho and all never tried with them.
You can give a try.
Regards
Ashok S.
From: Chalcy Raja [mailto:chalcy.r...@ca
Yes Abshiek,
By setting below prop. You ll get better result. The no should depends on ur
data size.
Regards
Ashok S.
From: Bejoy KS [mailto:bejoy...@yahoo.com]
Sent: 26 September 2012 21:04
To: user@hive.apache.org
Subject: Re: Hive configuration property
Hi Abshiek
Based on my experience you
Hi Richin,
Thanks! Yes this is what I wanted to understand how to load zip file to Hive
table. Now, I'll try this option.
Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
-Original Message-
From:
Date: Wed, 26 Sep 2012 14:51:39
To:
Reply-To: user@hive.apache.org
Subject
Hive is poor on user authentication and authorization. But all map reduce
jobs launched by Hive follow the Hadoop file permissions while reading and
writing data. So best way to enforce security using Hive is to secure your
data in HDFS using appropriate permissions.
Thanks,
Vinod
http://blog.vin
Hive can be run in embedded mode also as it does not need to be installed
on Hadoop Cluster it is required only on client side. Have a look at
CliDriver and HiveServer classes in Hive code base to know how to use Hive
in embedded mode.
Thanks,
Vinod
http://blog.vinodsingh.com/
On Wed, Sep 26, 20
Using hive server with Tableau now and realize that the user comes in as "hive"
user. Also after reading more in the emails and elsewhere and found that hive
server is not thread safe and does not have way to set up authentication.
How is the connection to hive handled in Tableau, Microstrategy
Thanks bejoy, I will try that.
Regards
Abhi
Sent from my iPhone
On Sep 26, 2012, at 11:34 AM, Bejoy KS wrote:
> Hi Abshiek
>
> Based on my experience you can always provide the number of reduce tasks
> (mapred.reduce.tasks) based on the data volume your query handles. It can
> yield you be
Hi Abshiek
Based on my experience you can always provide the number of reduce
tasks (mapred.reduce.tasks) based on the data volume your query handles. It
can yield you better performance numbers.
Regards,
Bejoy KS
From: Abhishek
To: "user@hive.apache.org"
You are right Chuck. I thought his question was how to use zip files or any
compressed files in Hive tables.
Yeah, seems like you can't do that see:
http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=gb9yvasr2jl0u3yul2tfgu0...@mail.gmail.com%3E
But you can
But TEXTFILE in Hive always has newline as the record delimiter. How could this
possibly work with a zip/tar file that can contain ASCII 10 characters at
random locations, and certainly does not have ASCII 10 at the end of each data
record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
Fr
Hi,
Can you look up the file names of each mapper? You can do so by
looking at a running task UI in the status column. Also what split
property do you mean? Can you give your job's console output?
Also, the best recommended way is to use a splittable format like
Avro, Seq files, indexed LZO, etc.
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just
point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT
DELIMITED FIELDS TERMINATED BY STORED AS TEXTFILE
LOCATION '/home/manish/zipfile';
Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max"
property.
Regards
Abhi
Sent from my iPhone
On Sep 26, 2012, at 9:23 AM, bharath vissapragada
wrote:
>
> I'm no expert in hive, but here are my 2 cents.
>
> By default hive schedules a reducer per every 1 GB of
I'm no expert in hive, but here are my 2 cents.
By default hive schedules a reducer per every 1 GB of data ( change that
value by modifying *hive.exec.reducers.bytes.per.reducer ) *. If your input
data is huge, there will be large number of reducers, which might be
unnecessary.( Sometimes large nu
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use
zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
-Original Message-
From: "Savant, Keshav"
Date: Wed, 26 Sep 2012 12:25:38
To: user@
Hi Dilip,
Thanks for your response, does HIVE API provide any thing to connect to HIVE
SERVER with out using thrift call.We stopped using thrift because of security
issues in hive.
Regards
Abhi
Sent from my iPhone
On Sep 26, 2012, at 12:46 AM, Dilip Joseph
wrote:
> You don't necessarily ne
Hi all,
I have doubt regarding below properties, is it a good practice to override
below properties in hive.
If yes, what is the optimal values for the following properties?
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.m
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text
file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Sava
This could be a problem. Hive uses newline as the record separator. A ZIP file
will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a
record separator" problem, because we ran into it for another type of
comp
As I mention you can copy jar to your hadoop cluster at /usr/lib/hive/lib and
then us directly in Hiveql.
Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
-Original Message-
From: Manu A
Date: Wed, 26 Sep 2012 15:01:14
To:
Reply-To: user@hive.apache.org
Subject: Re: Custom
Hi Manish,
Thanks,I did like the same.but how to invoke the custom java map/reduce
functions ( com.hive.test.TestMapper ) since there is no script as it is a
jar file.The process looks bit different from UDF( used create temporary
function).
On Wed, Sep 26, 2012 at 12:25 PM, Manish.Bhoge wrote:
Hivers,
I want to understand that would it be possible to utilize zip/tar files
directly into Hive. All the files has similar schema (structure). Say 50 *.txt
files are zipped into a single zip file can we load data directly from this zip
file OR should we need to unzip first?
Thanks & Regard
43 matches
Mail list logo