Creating Indexes

Peter Marron Wed, 31 Oct 2012 10:43:06 -0700

Hi,

I am still having problems building my index.
In an attempt to find someone who can help me
I'll go through all the steps that I try.

1)      First I load my data into hive.

hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;
Loading data to table default.score
Deleted hdfs://localhost/data/warehouse/score
OK
Time taken: 7.817 seconds

2)      Then I try to create the index

hive> CREATE INDEX bigIndex
    > ON TABLE score(Ath_Seq_Num)
    > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';
FAILED: Error in metadata: java.lang.RuntimeException: Please specify deferred 
rebuild using " WITH DEFERRED REBUILD ".
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask
hive>

3)      OK, so it suggests that I use "DEFERRED BUILD" and so I do
hive>
    >
    > CREATE INDEX bigIndex
    > ON TABLE score(Ath_Seq_Num)
    > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
    > WITH DEFERRED REBUILD;
OK
Time taken: 0.603 seconds

4)      Now, to create the index I assume that I use ALTER INDEX as follows:

hive>ALTER INDEX bigIndex ON score REBUILD;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 138
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201210311448_0001, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001
Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job  
-Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001
Hadoop job information for Stage-1: number of mappers: 511; number of reducers: 
138
2012-10-31 15:59:27,076 Stage-1 map = 0%,  reduce = 0%

5)      This all looks promising, and after increasing my heapsize to get the 
Map/Reduce to complete, I get this an hour later

2012-10-31 17:08:23,572 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
4135.47 sec
MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds 470 
msec
Ended Job = job_201210311448_0001
Loading data to table default.default__score_bigindex__
Deleted hdfs://localhost/data/warehouse/default__score_bigindex__
Invalid alter operation: Unable to alter index.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask

So what have I done wrong, and what am I to do to get this index to build 
successfully?

Any help appreciated.

Peter Marron

From: Peter Marron [mailto:[email protected]]
Sent: 24 October 2012 13:27
To: [email protected]
Subject: RE: Indexes

Hi Shreepadma,

Thanks for this. Looks exactly like the information I need.
I was going to reply when I had tried it all out, but I'm having
problems creating the index at the moment (I'm getting an
OutOfMemoryError at the moment). So I thought that I had
better reply now to say thank you.

Peter Marron

From: Shreepadma Venugopalan [mailto:[email protected]]
Sent: 23 October 2012 19:49
To: [email protected]<mailto:[email protected]>
Subject: Re: Indexes

Hi Peter,

Indexing support was added to Hive in 0.7 and in 0.8 the query compiler was 
enhanced to optimized some class of queries (certain group bys and joins) using 
indexes. Assuming you are using the built in index handler you need to do the 
following _after_ you have created and rebuilt the index,

SET hive.index.compact.file='/tmp/index_result';
SET 
hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;

You will then notice speed up for a query of the form,

select count(*) from tab where indexed_col = some_val

Thanks,
Shreepadma

On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron 
<[email protected]<mailto:[email protected]>> 
wrote:
Hi,

I'm very much a Hive newbie but I've been looking at HIVE-417 and this page in 
particular:
http://cwiki.apache.org/confluence/display/Hive/IndexDev
Using this information I've been able to create an index (using Hive 0.8.1)
and when I look at the contents it all looks very promising indeed.
However on the same page there's this comment:

"...This document currently only covers index creation and maintenance. A 
follow-on will explain how indexes are used to optimize queries (building on 
FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev>)...."

However I can't find the "follow-on" which tells me how to exploit the index 
that I've
created to "optimize" subsequent queries.
Now I've been told that I can create and use indexes with the current
release of Hive _without_ writing and developing any Java code of my own.
Is this true? If so, how?

Any help appreciated.

Peter Marron.

Creating Indexes

Reply via email to