ORC file question

2014-02-10 Thread Avrilia Floratou
Hi all, I'm running a query that scans a file stored in ORC format and extracts some columns. My file is about 92 GB, uncompressed. I kept the default stripe size. The MapReduce job generates 363 map tasks. I have noticed that the first 180 map tasks finish in 3 secs (each) and after they

Re: ORC file question

2014-02-10 Thread Avrilia Floratou
). This happens when the input format is set to HiveInputFormat. Thanks Prasanth Jayachandran On Feb 10, 2014, at 12:49 AM, Avrilia Floratou avrilia.flora...@gmail.com wrote: Hi all, I'm running a query that scans a file stored in ORC format and extracts some columns. My file is about 92 GB

Re: ORC file question

2014-02-10 Thread Avrilia Floratou
) What query are you using? Thanks Prasanth Jayachandran On Feb 10, 2014, at 1:26 PM, Avrilia Floratou avrilia.flora...@gmail.com wrote: Hi Prasanth, No it's not a partitioned table. The table consists of only one file of (91.7 GB). When I created the table I loaded data from a text table

Re: ORC file question

2014-02-10 Thread Avrilia Floratou
-6326. I will try to reproduce your scenario and see if I hit similar issue. Thanks Prasanth Jayachandran On Feb 10, 2014, at 1:46 PM, Avrilia Floratou avrilia.flora...@gmail.com wrote: Hi Prasanth, Here are the answers to your questions: 1) Yes I have set both set hive.optimize.ppd=true

Map-side join memory limit is too low

2014-01-31 Thread Avrilia Floratou
Hi, I'm running hive 0.12 on yarn and I'm trying to convert a common join into a map join. My map join fails and from the logs I can see that the memory limit is very low: Starting to launch local task to process map join; maximum memory = 514523136 How can I increase the maximum memory?

ORC file tuning

2013-12-29 Thread Avrilia Floratou
Hi all, I'm using Hive 0.12 and running some experiments with the ORC file. The hdfs block size is 128MB and I was wondering what is the best stripe size to use. The default one (250MB) is larger than the block size. Is each stripe splittable or in this case each map task will have to access data

Question on correlation optimizer

2013-12-10 Thread Avrilia Floratou
Hi, I'm running TPCH query 21 on Hive. 0.12 and have enabled hive.optimize.correlation. I could see the effect of the correlation optimizer on query 17 but when running query 21 I don't actually see the optimizer being used. I used the publicly available tpc-h queries for hive and merged all the

Re: Question on correlation optimizer

2013-12-10 Thread Avrilia Floratou
plan. Also, another kind of cases that the correlation optimizer does not optimize right now is that a table is used in multiple MR jobs but rows in this table are shuffled in different ways. Thanks, Yin On Tue, Dec 10, 2013 at 8:05 PM, Avrilia Floratou avrilia.flora...@gmail.com wrote

Predicate pushdown/indexing on ORC file

2013-11-07 Thread Avrilia Floratou
Hi all, I'm using hive-12. I have a file that contains 10 integer columns stored in ORC format. The ORC file is zlib compressed and indexing is enabled. I'm running a simple select count(*) with a predicate of the form (Col1 =0 OR col2 = 0 etc). The predicate touches all 10 columns but its

Hive-Hadoop compatibility

2012-05-09 Thread Avrilia Floratou
Hi all, Does anyone know if Hive 0.7 or 0.8 can work with Hadoop 0.21.0 or 0.22.0? Thanks, Avrilia

RCFile and Hadoop Counters

2012-01-31 Thread Avrilia Floratou
Hi, I have a question related to the hadoop counters when RCFile is used. I have 16TB of (uncompressed) data stored in compressed RCFile format. The size of the compressed RCFile is approximately 3 TB. I ran a simple scan query on this table. Each split is 256 MB (HDFS block size). From the

Re: Question on bucketed map join

2012-01-24 Thread Avrilia Floratou
mapside join a bucketed map side join would be triggered. Hope it helps!.. Regards Bejoy.K.S From: Avrilia Floratou flora...@cs.wisc.edu To: user@hive.apache.org Sent: Thursday, January 19, 2012 9:23 PM Subject: Question on bucketed map join Hi, I have two tables with 8 buckets each

Question on bucketed map join

2012-01-19 Thread Avrilia Floratou
Hi, I have two tables with 8 buckets each on the same key and want to join them. I ran explain extended and get the plan produced by HIVE which shows that a map-side join is a possible plan. I then set in my script the hive.optimize.bucketmapjoin option to true and reran the explain extended

Problem with query on bucketed table

2011-10-09 Thread Avrilia Floratou
Hello, I have a question regarding the execution of some queries on bucketed tables. I've created a compressed bucketed table using the following statement: create external table partRC (P_PARTKEY BIGINT,P_NAME STRING, P_MFGR STRING, P_BRAND STRING, P_TYPE STRING, P_SIZE INT, P_CONTAINER

Re: Problem with query on bucketed table

2011-10-09 Thread Avrilia Floratou
table. In Partitions use Dynamic Partitions to load data from the source table into the target table on partitions on the fly. Hope it helps!.. Regards Bejoy.K.S From: Avrilia Floratou flora...@cs.wisc.edu To: user@hive.apache.org Sent: Monday, October

Indexing

2011-10-07 Thread Avrilia Floratou
Hi, I'd like to know what's the current status of indexing in hive. What I've found so far is that the user has to manually set the index table for each query. Sth like this: ** insert overwrite directory /tmp/index_result select `_bucketname`