Hello,
I have seen some blog saying that Indexing is not recommended , instead we
can use ORC format. Can you please provide suggestion?
I could not see any official declaration.
Kind Regards,
Sachit Murarka
To: user@hive.apache.orgmailto:user@hive.apache.org
user@hive.apache.orgmailto:user@hive.apache.org
Subject: RE: Hive indexing optimization
I've attached the output. Thanks.
B
Subject: Re: Hive indexing optimization
From: jpullokka
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Thank you,
B
Subject: Re: Hive indexing optimization
From
@hive.apache.org
user@hive.apache.orgmailto:user@hive.apache.org
Subject: RE: Hive indexing optimization
Here is the explain output:
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Reducer 2 - Map 1 (SIMPLE_EDGE), Map 3 (SIMPLE_EDGE)
Vertices:
Map 1
Map Operator Tree
I've attached the output. Thanks.
B
Subject: Re: Hive indexing optimization
From: jpullokka...@hortonworks.com
To: user@hive.apache.org
Date: Mon, 29 Jun 2015 19:17:44 +
Could you post explain extended output?
From: Bennie Leo tben...@hotmail.com
Reply
SELECT StartIp, EndIp, Country FROM ipv4geotable” should have been
rewritten as a scan against index table.
BitMap Indexes seems to support inequalities (=, , =).
Post the explain plan.
On 6/26/15, 8:56 PM, Gopal Vijayaraghavan gop...@apache.org wrote:
Hi,
Hive indexes won¹t really help you
;
?
I don't know how I could include this within my current query.
Cheers,
B
Subject: Re: Hive indexing optimization
From: jpullokka...@hortonworks.com
To: user@hive.apache.org
Date: Fri, 26 Jun 2015 01:27:21 +
Set hive.optimize.index.filter=true;
Thanks
John
From: Bennie Leo
Hi,
Hive indexes won¹t really help you speed up that query right now, because
of the plan it generates due to the = clauses.
CREATETABLE ipv4table
AS
SELECT logon.IP, ipv4.Country
FROM
(SELECT * FROM logontable WHERE isIpv4(IP)) logon
LEFT OUTER JOIN
(SELECT StartIp, EndIp, Country FROM
Hi,
I am attempting to optimize a query using indexing. My current query converts
an ipv4 address to a country using a geolocation table. However, the
geolocation table is fairly large and the query takes an impractical amount of
time. I have created indexes and set the binary search
@hive.apache.org
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Hive indexing optimization
Hi,
I am attempting to optimize a query using indexing. My current query converts
an ipv4 address to a country using a geolocation table. However, the
geolocation table is fairly large
Hello,
Is it possible to create an index on table stored as ORC and compressed as
Snappy?
Does it make sense? I am wondering if Hive indexing is a mature functionality?
Thanks,
Alain
Hello,
Is it possible to create an index on table stored as ORC and compressed as
Snappy?
Does it make sense? I am wondering if Hive indexing is a mature functionality?
Thanks,
Alain
Hi,
For large tables, its takes a lot of time to load the indexes in the index
table. Is there any way we can reduce the index load time?
CREATE TABLE SE_TX_SUMMARY (COUNTY string, BLOCKGROUPID string, GROUPING_ID
int) PARTITIONED BY (EXPOSED_TIME int) row format delimited fields
terminated by
Hi Guys,
We have a Hive 0.12 ORC table that is partitioned on year, month, day, hour
and is bucketed by one column.
So far so good - We are seeing good speed up improvements as compared to
non-ORC format.
- Now we want to add an index on another commonly used column. My
question was -
Hi all,
I'm using hive-12. I have a file that contains 10 integer columns stored in
ORC format. The ORC file is zlib compressed and indexing is enabled.
I'm running a simple select count(*) with a predicate of the form (Col1 =0
OR col2 = 0 etc). The predicate touches all 10 columns but its
have a file that contains 10 integer columns stored in
ORC format. The ORC file is zlib compressed and indexing is enabled.
I'm running a simple select count(*) with a predicate of the form (Col1 =0 OR
col2 = 0 etc). The predicate touches all 10 columns but its selectivity is 0
(none
Hi,
I am new to Hive, and am trying to setup an index on a Hive table to
improve query performance.
I am presently using the CDH 4.2 Hadoop distribution, which ships with
Hive 0.10, so from what I have read table index support should be
available.
What I am seeing though is that when I go and
The stub of an Indexing user doc in the Hive wiki's Language Manual now
includes some simple examples, adapted from the test suite.
Would someone who uses Hive indexes please review it and make any necessary
corrections additions? For example, I omitted examples of indexes on
partitioned tables
I am playing with Hive indexing and a little discouraged by the gap between
the potential seen and the amount of documentation around indexing. I am
running Hive 0.9 and started playing with indexing as follows:
I have a table logs that has a bunch of fields but for this, lets say
three
I do not have answers to any of your questions, but I appreciate you raising
them. My team is very interested in Hive indexing as well, so I look forward to
this discussion.
Chuck Connell
Nuance RD Data Team
Burlington, MA
From: John Omernik [mailto:j...@omernik.com]
Sent: Thursday, July 26
I have written a custom index handler and wanted to test it. However hive
is not using it.
So I test with simple table (pokes (int foo, string bar)) which comes with
hive distribution for testing purpose.
Then I created a compact index and set the set
hive.optimize.index.filter=true;
However, upon
I am currently using hive 0.7.1 and creating indexes based on columns in the
where clause. However, when I run the explain plan I do not see the index being
leveraged. The syntax that I am using to build the index is as follows:
CREATE INDEX x ON TABLE t(j)
AS
Hi,
I'd like to know what's the current status of indexing in hive. What I've
found so far is that the user has to manually set the index table for each
query. Sth like this:
**
insert overwrite directory /tmp/index_result select `_bucketname
on a side note, i'm looking at adding indexes to our hive tables as well, is
there a performance/space trade off comparison or metrics?
thx!
On Wed, Aug 3, 2011 at 10:52 AM, Siddharth Ramanan
siddharth.rama...@gmail.com wrote:
Hi all,
I have used compact index for my table and the
Hi,
can indexes work on gzipped files?
The index gets build without errors using
ALTER INDEX syslog_index ON syslog PARTITION(dt='2011-08-03') REBUILD;
but when querying, no results are returned (and no errors reported). The
query should be correct because with plaintext files it works.
unfortunately it does not, because can not split .gz file.
2011/8/3 Martin Konicek martin.koni...@gmail.com:
Hi,
can indexes work on gzipped files?
The index gets build without errors using
ALTER INDEX syslog_index ON syslog PARTITION(dt='2011-08-03') REBUILD;
but when querying, no
Hi,
I have a table, which has close to a billion rows.. I am trying to
create an index for the table, when I do the alter command, I always end up
with map-reduce jobs with errors. The same runs fine for small tables
though, I also notice that the number of reducers are set to 24, even if set
Hi all,
I tried to index the lzo file but got the following error while indexing the
lzo file :
java.lang.ClassCastException:
com.hadoop.compression.lzo.LzopCodec$LzopDecompressor cannot be cast to
com.hadoop.compression.lzo.LzopDecompressor
28 matches
Mail list logo