Converting arraytinyint in Flume avro default output to string in Hive

2013-11-13 Thread Deepak Subhramanian
Hi, I am using Flume to generate events in the default flume avro output format. Bytes in avro schema are stored as arraytinyint in Hive when I use avroserde for hive . How do I convert arraytinyint to string to read the flume body data. I am using hive version 0.10 CREATE external TABLE

Compression for a HDFS text file - Hive External Partition Table

2013-11-13 Thread Raj Hadoop
Hi ,    1)  My requirement is to load a file ( a tar.gz file which has multiple tab separated values files and one file is the main file which has huge data – about 10 GB per day) to an externally partitioned hive table.   2)  What I am doing is I have automated the process by extracting

Re: ORC Tuning - Examples?

2013-11-13 Thread Yin Huai
Hi John, Here is my experience on the stripe size. For a given table, when the stripe size is increased, the size of a column in a stripe increases, which means the ORC reader can read a column from disks in a more efficient way because the reader can sequentially read more data (assuming the

Re: ORC Tuning - Examples?

2013-11-13 Thread John Omernik
Yin - Fantastic! That is exactly the type of explanation of settings I'd like to see. More than just what it does, but the tradeoffs, and how things are applied in the real world. Have you played with the stride length at all? On Wed, Nov 13, 2013 at 1:13 PM, Yin Huai huaiyin@gmail.com

Hive skewed tables

2013-11-13 Thread Rajesh Balamohan
Hi All, I have the following skewed table addresses_1 select id, count(*) c from addresses_1 group by id order by c desc limit 10; 1426246531554806 198477395958492 102641838220181 138947865211331 156483436193429 96411677179771 210082076168033 800174765152421

Re: Hive skewed tables

2013-11-13 Thread Nitin Pawar
In my understanding, when you are saying scanning entire dataset it is looking at all your partitions because your data has been partitioned by the date column. A skewed table is a table where there will be different files created for all your skewed keys in all the partitions. So for your query

Re: ORC Tuning - Examples?

2013-11-13 Thread Yin Huai
Hi John, I have not played with the stride length. Based on my understanding of the code, since the stride length determines the number of rows between index entries, if you decrease the stride length, you can get more fine-grained indexes which can potentially help you to skip more unnecessary

Re: Hive skewed tables

2013-11-13 Thread Rajesh Balamohan
Thanks Nitin. I have only one partition in this table for testing. I thought within the partition it will scan only certain files based on skewed fields. However it is scanning the entire data within the partition. On Nov 14, 2013 9:38 AM, Nitin Pawar nitinpawar...@gmail.com wrote: In my

config hive authorization (hive with kerberos and remote metastore)

2013-11-13 Thread david1990...@163.com
Hive is configured with remote metastore and kerberos ,and it works fine . But now ,I want to config hive authorization ,and I modify hive-site.xml like this: property namehive.security.authorization.enabled/name valuetrue/value descriptionenable or disable the hive client

Re: config hive authorization (hive with kerberos and remote metastore)

2013-11-13 Thread Mikhail Antonov
Did you try connecting from beeline console? Also, that happens on the default database, what happens if you try to create a new database? -Mikhail 2013/11/13, david1990...@163.com david1990...@163.com: Hive is configured with remote metastore and kerberos ,and it works fine . But now ,I

Re: Hive skewed tables

2013-11-13 Thread Nitin Pawar
how did u check its looking at all files inside the partition? If you want more restriction on limit on filse to be accessed, you can bucket them as well. That way you really dont have to worry about which data is skewed and let the framework handle it. On Thu, Nov 14, 2013 at 11:16 AM, Rajesh

Re: Re: config hive authorization (hive with kerberos and remote metastore)

2013-11-13 Thread david1990...@163.com
Whatever I authorize to user hadoop, I cannot do 'select ' even I change the database or use beeline. Can anyone config hive authorization successfully with remote metastore ? From: Mikhail Antonov Date: 2013-11-14 13:57 To: user Subject: Re: config hive authorization (hive with kerberos and

Running Hive 0.12.0 with Hadoop 2.2

2013-11-13 Thread Bill Q
Hi, Does Hive 0.12 run on Hadoop 2.2? if it does, anything special about installing it or running hive? Will hive start a M/R ApplicationMaster by itself or should I start one for it? Many thanks. Bill