Re: how to load TAB_SEPRATED file in hive table

2012-09-26 Thread Bejoy KS
Hi Yogesh Which ever character is the column separator in your input file , you need to provide that in FIELDS TERMINATED BY clause. Also the common storage formats supported by hive includes - Text File - Sequence File - RC File etc Regards Bejoy KS Sent from handheld, please excuse typos.

Re: Hive query failing

2012-09-26 Thread Sarath
@Kulkarni, As suggested copied the JAR into the specified directory, now it is complaining as - /java.io.FileNotFoundException: File does not exist: /tmp/hduser/hive_2012-09-27_11-24-32_966_3932937489091919630/-mr-1/1/emptyFile/ where hduser is the unix login ID from where i'm executing the

RE: how to load TAB_SEPRATED file in hive table

2012-09-26 Thread yogesh dhari
Thanks Ashok :-), I am not so much aware about regarding storage formats in hive(I am new to hive). Could you please list some of them except those. 1) space seprated --> FIELDS TERMINATED BY " "; 2) Control-A seprated > FIELDS TERMINATED BY '\001' 3) Tab septared -->

Re: issue hive with external derby

2012-09-26 Thread Bertrand Dechoux
Hi, For 1), did you follow the wiki? https://cwiki.apache.org/confluence/display/Hive/HiveDerbyServerMode Maybe you didn't provide the right jars. Did you check that you could connect yourself to the database? For 2), I don't know which version is supported for Hadoop. https://cwiki.apache.org/c

RE: how to load TAB_SEPRATED file in hive table

2012-09-26 Thread ashok.samal
'\t' From: yogesh dhari [yogeshdh...@live.com] Sent: Thursday, September 27, 2012 10:42 AM To: hive request Subject: how to load TAB_SEPRATED file in hive table Hi all, I have a file in which records are Tab-Seprated, Please suggest me how to upload such f

how to load TAB_SEPRATED file in hive table

2012-09-26 Thread yogesh dhari
Hi all, I have a file in which records are Tab-Seprated, Please suggest me how to upload such file in Hive table. Like how to specify Create table XYZ (name STRING, roll INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY Please suggest for "" over here. Thanks & Regards Yogesh Kumar

Re:Re: size of RCFile in hive

2012-09-26 Thread 王锋
but it's map only job At 2012-09-27 05:39:39,"Chen Song" wrote: As far as I know, the number of files emitted would be determined by the number of mappers for a map only job and the number of reducers for a map reduce job. So it totally depends how your query translates into a MR job. You ca

RE: zip file or tar file cosumption

2012-09-26 Thread Savant, Keshav
Manish the table that has been created for zipped text files should be defined as sequence file, for example CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile; After this you can use regular load command to load these files,

issue hive with external derby

2012-09-26 Thread AnilKumar B
Hi, Can anybody help me in following issues. 1) I am using hadoop-1.0.3 with hive-0.9.0 When start hive in embedded derby mode it is working fine. But when I start in external derby mode I am getting following error. What could be the issue? hive> show tables; FAILED: Error in metadata: javax.jdo

Re: size of RCFile in hive

2012-09-26 Thread Chen Song
As far as I know, the number of files emitted would be determined by the number of mappers for a map only job and the number of reducers for a map reduce job. So it totally depends how your query translates into a MR job. You can enforce it by setting the property *mapred.reduce.tasks=1* Chen

Re: How can I get the constant value from the ObjectInspector in the UDF

2012-09-26 Thread Chen Song
With my limited knowledge of hive, I don't think it is possible to get the actual value of the argument and I don't think it is or should be designed to provide that information either. *initialize* is intended only for decoding the meta structure (type and its associated evaluation mechanism) of a

RE: How can I get the constant value from the ObjectInspector in the UDF

2012-09-26 Thread java8964 java8964
I understand your message. But in this situation, I want to do the following: 1) I want to get the value 10 in the initialization stage. I understand your point that the value will only available in the evaluate stage, but keep in mind that for this 10 in my example, it is a constants value. It

Re: Hive configuration property

2012-09-26 Thread Abhishek
Hi ashok, Thank you very much. Regards Abhi Sent from my iPhone On Sep 26, 2012, at 1:32 PM, wrote: > Hello Abhi, > Hope below information ll help you. > mapred.reduce.tasks > Default Value: -1 > Added In: 0.1 > The default number of reduce tasks per job. Typically set to a prime close to >

Re: How to optimize a group by query

2012-09-26 Thread Abhishek
Thanks bejoy. Regards Abhi Sent from my iPhone On Sep 26, 2012, at 1:42 PM, Bejoy KS wrote: > Hi Abshiek > > From the map reduce logs you can see whether the data processed by one > reducer is much more than that of other reducers. Or in short one reducer > takes relatively longer time comp

Re: How to optimize a group by query

2012-09-26 Thread Bejoy KS
Hi Abshiek From the map reduce logs you can see whether the data processed by one reducer is much more than that of other reducers. Or in short one reducer takes relatively longer time complete compared to others. Also to my previous mail, one more optimization is possible for group By if your

Re: How can I get the constant value from the ObjectInspector in the UDF

2012-09-26 Thread Chen Song
Hi Yong The way GenericUDF works is as follows. *ObjectInspector initialize(ObjectInspector[] arguments) *is called only once for one GenericUDF instance used in your Hive query. This phase is for preparation steps of UDF, such as syntax check and type inference. *Object evaluate(DeferredObject[

RE: Hive configuration property

2012-09-26 Thread ashok.samal
Hello Abhi, Hope below information ll help you. mapred.reduce.tasks * Default Value: -1 * Added In: 0.1 The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default

Re: How to optimize a group by query

2012-09-26 Thread Abhishek
Hi Bejoy, Thanks for the reply, how can I know data skew among reducers. Regards Abhi Sent from my iPhone On Sep 26, 2012, at 1:20 PM, Bejoy KS wrote: > Hi Abshiek > > Group by performance can be improved by the following > 1)enabling map side aggregation. In latest versions it is enabled by

Re: How to optimize a group by query

2012-09-26 Thread Bejoy KS
Hi Abshiek Group by performance can be improved by the following 1)enabling map side aggregation. In latest versions it is enabled by default SET hive.map.aggr = true; 2)Is there a data skew observed in some of the reducers? If so a better performance can be yielded by setting the following prop

How to optimize a group by query

2012-09-26 Thread Abhishek
Hi all, I have written a query with group by clause, it is consuming lot of time is there any way to optimize this any configuration property or some thing. Regards Abhi Sent from my iPhone

Re: Hive configuration property

2012-09-26 Thread Abhishek
Hi Ashok, Thanks for the reply, can you please tell me how many reducers should be considered using for 1 GB of intermediate data. Regards Abhi Sent from my iPhone On Sep 26, 2012, at 12:39 PM, wrote: > Yes Abshiek, > By setting below prop. You ll get better result. The no should depends on

RE: hive server security/authentication

2012-09-26 Thread ashok.samal
Hi Chalcy, You can go for OAUTH for security purpose if you want to provide specific authorization flows for web applications with hive. Even Google relay on this. But coming to pentaho and all never tried with them. You can give a try. Regards Ashok S. From: Chalcy Raja [mailto:chalcy.r...@ca

RE: Hive configuration property

2012-09-26 Thread ashok.samal
Yes Abshiek, By setting below prop. You ll get better result. The no should depends on ur data size. Regards Ashok S. From: Bejoy KS [mailto:bejoy...@yahoo.com] Sent: 26 September 2012 21:04 To: user@hive.apache.org Subject: Re: Hive configuration property Hi Abshiek Based on my experience you

Re: zip file or tar file cosumption

2012-09-26 Thread Manish Bhoge
Hi Richin, Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option. Thank You, Manish. Sent from my BlackBerry, pls excuse typo -Original Message- From: Date: Wed, 26 Sep 2012 14:51:39 To: Reply-To: user@hive.apache.org Subject

Re: hive server security/authentication

2012-09-26 Thread विनोद सिंह
Hive is poor on user authentication and authorization. But all map reduce jobs launched by Hive follow the Hadoop file permissions while reading and writing data. So best way to enforce security using Hive is to secure your data in HDFS using appropriate permissions. Thanks, Vinod http://blog.vin

Re: How connect to hive server without using jdbc

2012-09-26 Thread विनोद सिंह
Hive can be run in embedded mode also as it does not need to be installed on Hadoop Cluster it is required only on client side. Have a look at CliDriver and HiveServer classes in Hive code base to know how to use Hive in embedded mode. Thanks, Vinod http://blog.vinodsingh.com/ On Wed, Sep 26, 20

hive server security/authentication

2012-09-26 Thread Chalcy Raja
Using hive server with Tableau now and realize that the user comes in as "hive" user. Also after reading more in the emails and elsewhere and found that hive server is not thread safe and does not have way to set up authentication. How is the connection to hive handled in Tableau, Microstrategy

Re: Hive configuration property

2012-09-26 Thread Abhishek
Thanks bejoy, I will try that. Regards Abhi Sent from my iPhone On Sep 26, 2012, at 11:34 AM, Bejoy KS wrote: > Hi Abshiek > > Based on my experience you can always provide the number of reduce tasks > (mapred.reduce.tasks) based on the data volume your query handles. It can > yield you be

Re: Hive configuration property

2012-09-26 Thread Bejoy KS
Hi Abshiek Based on my experience you can always provide the number of reduce tasks (mapred.reduce.tasks) based on the data volume your query handles. It can yield you better performance numbers.    Regards, Bejoy KS From: Abhishek To: "user@hive.apache.org"

RE: zip file or tar file cosumption

2012-09-26 Thread richin.jain
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables. Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=gb9yvasr2jl0u3yul2tfgu0...@mail.gmail.com%3E But you can

RE: zip file or tar file cosumption

2012-09-26 Thread Connell, Chuck
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record? Chuck Connell Nuance R&D Data Team Burlington, MA Fr

Re: Hive File Sizes, Merging, and Splits

2012-09-26 Thread Ruslan Al-Fakikh
Hi, Can you look up the file names of each mapper? You can do so by looking at a running task UI in the status column. Also what split property do you mean? Can you give your job's console output? Also, the best recommended way is to use a splittable format like Avro, Seq files, indexed LZO, etc.

RE: zip file or tar file cosumption

2012-09-26 Thread richin.jain
Hi Manish, If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY STORED AS TEXTFILE LOCATION '/home/manish/zipfile';

Re: Hive configuration property

2012-09-26 Thread Abhishek
Thanks Bharath, Your points make sense.I'll try this "hive.exec.reducers.max" property. Regards Abhi Sent from my iPhone On Sep 26, 2012, at 9:23 AM, bharath vissapragada wrote: > > I'm no expert in hive, but here are my 2 cents. > > By default hive schedules a reducer per every 1 GB of

Re: Hive configuration property

2012-09-26 Thread bharath vissapragada
I'm no expert in hive, but here are my 2 cents. By default hive schedules a reducer per every 1 GB of data ( change that value by modifying *hive.exec.reducers.bytes.per.reducer ) *. If your input data is huge, there will be large number of reducers, which might be unnecessary.( Sometimes large nu

Re: zip file or tar file cosumption

2012-09-26 Thread Manish Bhoge
Hi Savant, Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement. Sent from my BlackBerry, pls excuse typo -Original Message- From: "Savant, Keshav" Date: Wed, 26 Sep 2012 12:25:38 To: user@

Re: How connect to hive server without using jdbc

2012-09-26 Thread Abhishek
Hi Dilip, Thanks for your response, does HIVE API provide any thing to connect to HIVE SERVER with out using thrift call.We stopped using thrift because of security issues in hive. Regards Abhi Sent from my iPhone On Sep 26, 2012, at 12:46 AM, Dilip Joseph wrote: > You don't necessarily ne

Hive configuration property

2012-09-26 Thread Abhishek
Hi all, I have doubt regarding below properties, is it a good practice to override below properties in hive. If yes, what is the optimal values for the following properties? set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.m

RE: zip file or tar file cosumption

2012-09-26 Thread Savant, Keshav
Another solution would be Using shell script do following 1. unzip txt files, 2. one by one merge those 50 (or N number of) text files into one text file, 3. then the zip/tar that bigger text file, 4. then that big zip/tar file can be uploaded into hive. Keshav C Sava

RE: zip file or tar file cosumption

2012-09-26 Thread Connell, Chuck
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible. BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of comp

Re: Custom MR scripts using java in Hive

2012-09-26 Thread Manish Bhoge
As I mention you can copy jar to your hadoop cluster at /usr/lib/hive/lib and then us directly in Hiveql. Thank You, Manish. Sent from my BlackBerry, pls excuse typo -Original Message- From: Manu A Date: Wed, 26 Sep 2012 15:01:14 To: Reply-To: user@hive.apache.org Subject: Re: Custom

Re: Custom MR scripts using java in Hive

2012-09-26 Thread Manu A
Hi Manish, Thanks,I did like the same.but how to invoke the custom java map/reduce functions ( com.hive.test.TestMapper ) since there is no script as it is a jar file.The process looks bit different from UDF( used create temporary function). On Wed, Sep 26, 2012 at 12:25 PM, Manish.Bhoge wrote:

zip file or tar file cosumption

2012-09-26 Thread Manish . Bhoge
Hivers, I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first? Thanks & Regard