RE: Partition performance

2013-07-04 Thread Peter Marron
Hi, Just to check that I understand this problem, my reading suggests that the overhead of many partitions is currently unavoidable. Specifically this means that any query on a table that has, let’s say, 10,000 partitions will be significantly slower (than on un-partitioned table with the

Elastic MapReduce Hive Avro SerDe

2013-07-04 Thread Dan Filimon
Hi! I'm working on a few Avro MapReduce jobs whose output will end up on S3 to be processed by Hive. Amazon's latest Hive version [1] is 0.8.1 but Avro support was added in 0.9.1. I can only find the haivvreo project [2] that supports 0.7. Is this my only option? Thanks! [1]

RE: Partition performance

2013-07-04 Thread Peter Marron
Sorry, just caught up with the last couple of day’s email and I feel that this question has already been answered fairly comprehensively. Apologies. Z From: Peter Marron [mailto:peter.mar...@trilliumsoftware.com] Sent: 04 July 2013 08:37 To: user@hive.apache.org Subject: RE: Partition

How Can I store the Hive query result in one file ?

2013-07-04 Thread Matouk IFTISSEN
Hello Hive users, Is there a manner to store the Hive query result (SELECT *.) in a specfique and alone file (given the file name) like (INSERT OVERWRITE LOCAL DIRECTORY '/directory_path_name/')? Thanks for your answers

RE: metastore security issue

2013-07-04 Thread Shunichi Otsuka
One setting was missing: hive.metastore.authorization.storage.checks true This solves the problem -Original Message- From: Shunichi Otsuka [mailto:sots...@yahoo-corp.jp] Sent: Thursday, July 04, 2013 2:28 PM To: user@hive.apache.org Subject: metastore security issue I am trying to

RE: Experience of Hive local mode execution style

2013-07-04 Thread Guillaume Allain
Local mode really helps with those little delays. It definately helps for small data sets. But my concerns are about consistency of results with distributed modes and some requests that fails only when it is triggered (see my description below). From: Edward

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Nitin Pawar
will hive -e query filename or hive -f query.q filename will do ? you specially want it to write into a named file on hdfs only? On Thu, Jul 4, 2013 at 3:12 PM, Matouk IFTISSEN matouk.iftis...@ysance.comwrote: Hello Hive users, Is there a manner to store the Hive query result (SELECT

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Bertrand Dechoux
The question is what is the volume of your output. There is one file per output task (map or reduce) because that way each can write it independently and in parallel. That's how mapreduce work. And except by forcing the number of tasks to 1, there is no certain way to have one output file. But

Re: Elastic MapReduce Hive Avro SerDe

2013-07-04 Thread Ruslan Al-Fakikh
Hi. My guess is that you can try to look it up in their docs or mailing lists (Amazon EMR). IIRC, CDH had the patch for Avro+Hive before it was included in Hive itself, so Amazon EMR can have similar patches... Ruslan On Thu, Jul 4, 2013 at 12:20 PM, Dan Filimon

Hortonworks HDP 1.3 vs. HDP 1.1

2013-07-04 Thread Kumar Chinnakali
Hi Hive Team, Currently am developing and testing the Hive queries in HDP 1.1 with Hadoop 1.0.3 and Hive 0.9.0 However, it seems that my production is going to get upgraded to HDP 1.3 in near future. Will it will impact with respect to design, optimization? Please suggest. Regards, Kumar

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Michael Malak
I have found that for output larger than a few GB, redirecting stdout results in an incomplete file.  For very large output, I do CREATE TABLE MYTABLE AS SELECT ... and then copy the resulting HDFS files directly out of  /user/hive/warehouse. From: Bertrand

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Matouk IFTISSEN
Thanks for your responses, effctively the answer of Bertrand make this possible: the set of hive properities below froce thet job to write the hive result in one file whithout specifing the name (_0) : set hive.exec.reducers.max = 1; set mapred.reduce.tasks = 1; for Nitin, I want to store

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Nitin Pawar
the one i said does not work on hdfs files. Its just one way to write the stdlog to a file. I am not sure if hive allows you named files for output and the above settings will make your query run really slow if you have large dataset. if you are really specific on having a filename then for now

Re: Experience of Hive local mode execution style

2013-07-04 Thread Edward Capriolo
Since you are launching locally you have to account for this. 1) If multiple jobs are running they become a burden on the local memory of the system 2) Your local parameters like java heap size Xmx or mapred.child.java.opts may be getting applied locally, if you are doing distinct queries they may

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Edward Capriolo
Normally if use set mapred.reduce.tasks=1 you get one output file. You can also look at *hive*.*merge*.*mapfiles*, mapred.reduce.tasks, hive.merge.reducefiles also you can use a separate tool https://github.com/edwardcapriolo/filecrush On Thu, Jul 4, 2013 at 6:38 AM, Nitin Pawar

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Raj Hadoop
   hive set hive.io.output.fileformat=CSVTextFile;  hive insert overwrite local directory '/usr/home/hadoop/da1/' select * from customers *** customers is a Hive table From: Edward Capriolo edlinuxg...@gmail.com To: user@hive.apache.org user@hive.apache.org

Re: How Can I store the Hive query result in one file ?

2013-07-04 Thread Raj Hadoop
Adding to that - Multiple files can be concatenated from the directory like Example:  cat 0-0 00-1 0-2 final From: Raj Hadoop hadoop...@yahoo.com To: user@hive.apache.org user@hive.apache.org; matouk.iftis...@ysance.com

Re: Hortonworks HDP 1.3 vs. HDP 1.1

2013-07-04 Thread Owen O'Malley
For HDP specific questions, you should use the Hortonworks lists: http://hortonworks.com/community/forums/forum/hive/ Your question is about the difference between Hive 0.9 and Hive 0.11. The big additions are: Decimal type ORC files Analytics functions - cube roll up Windowing functions