Hi all,
I'm running a query that scans a file stored in ORC format and extracts
some columns. My file is about 92 GB, uncompressed. I kept the default
stripe size. The MapReduce job generates 363 map tasks.
I have noticed that the first 180 map tasks finish in 3 secs (each) and
after they
). This happens when the input format is
set to HiveInputFormat.
Thanks
Prasanth Jayachandran
On Feb 10, 2014, at 12:49 AM, Avrilia Floratou avrilia.flora...@gmail.com
wrote:
Hi all,
I'm running a query that scans a file stored in ORC format and extracts
some columns. My file is about 92 GB
) What query are you using?
Thanks
Prasanth Jayachandran
On Feb 10, 2014, at 1:26 PM, Avrilia Floratou avrilia.flora...@gmail.com
wrote:
Hi Prasanth,
No it's not a partitioned table. The table consists of only one file of
(91.7 GB). When I created the table I loaded data from a text table
-6326. I will try to reproduce
your scenario and see if I hit similar issue.
Thanks
Prasanth Jayachandran
On Feb 10, 2014, at 1:46 PM, Avrilia Floratou avrilia.flora...@gmail.com
wrote:
Hi Prasanth,
Here are the answers to your questions:
1) Yes I have set both set hive.optimize.ppd=true
Hi,
I'm running hive 0.12 on yarn and I'm trying to convert a common join into
a map join. My map join fails
and from the logs I can see that the memory limit is very low:
Starting to launch local task to process map join; maximum memory =
514523136
How can I increase the maximum memory?
Hi all,
I'm using Hive 0.12 and running some experiments with the ORC file. The
hdfs block size is 128MB and I was wondering what is the best stripe size
to use. The default one (250MB) is larger than the block size. Is each
stripe splittable or in this case each map task will have to access data
Hi,
I'm running TPCH query 21 on Hive. 0.12 and have enabled
hive.optimize.correlation.
I could see the effect of the correlation optimizer on query 17 but when
running query 21 I don't actually see the optimizer being used. I used the
publicly available tpc-h queries for hive and merged all the
plan. Also,
another kind of cases that the correlation optimizer does not optimize
right now is that a table is used in multiple MR jobs but rows in this
table are shuffled in different ways.
Thanks,
Yin
On Tue, Dec 10, 2013 at 8:05 PM, Avrilia Floratou
avrilia.flora...@gmail.com wrote
Hi all,
I'm using hive-12. I have a file that contains 10 integer columns stored in
ORC format. The ORC file is zlib compressed and indexing is enabled.
I'm running a simple select count(*) with a predicate of the form (Col1 =0
OR col2 = 0 etc). The predicate touches all 10 columns but its
Hi all,
Does anyone know if Hive 0.7 or 0.8 can work with Hadoop 0.21.0 or 0.22.0?
Thanks,
Avrilia
Hi,
I have a question related to the hadoop counters when RCFile is used.
I have 16TB of (uncompressed) data stored in compressed RCFile format. The size
of the compressed RCFile is approximately 3 TB.
I ran a simple scan query on this table. Each split is 256 MB (HDFS block
size).
From the
mapside join a
bucketed map side join would be triggered.
Hope it helps!..
Regards
Bejoy.K.S
From: Avrilia Floratou flora...@cs.wisc.edu
To: user@hive.apache.org
Sent: Thursday, January 19, 2012 9:23 PM
Subject: Question on bucketed map join
Hi,
I have two tables with 8 buckets each
Hi,
I have two tables with 8 buckets each on the same key and want to join them.
I ran explain extended and get the plan produced by HIVE which shows that a
map-side join is a possible plan.
I then set in my script the hive.optimize.bucketmapjoin option to true and
reran the explain extended
Hello,
I have a question regarding the execution of some queries on bucketed tables.
I've created a compressed bucketed table using the following statement:
create external table partRC (P_PARTKEY BIGINT,P_NAME STRING, P_MFGR
STRING, P_BRAND STRING, P_TYPE STRING, P_SIZE INT, P_CONTAINER
table. In
Partitions use Dynamic Partitions to load data from the source table into
the target table on partitions on the fly.
Hope it helps!..
Regards
Bejoy.K.S
From: Avrilia Floratou flora...@cs.wisc.edu
To: user@hive.apache.org
Sent: Monday, October
Hi,
I'd like to know what's the current status of indexing in hive. What I've
found so far is that the user has to manually set the index table for each
query. Sth like this:
**
insert overwrite directory /tmp/index_result select `_bucketname`
16 matches
Mail list logo