Re: Creating Indexes

Shreepadma Venugopalan Wed, 31 Oct 2012 11:38:24 -0700

Hi Peter,

Can you attach the execution logs? What is the exception that you see in
the execution logs?


Thanks,
Shreepadma

On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron <
peter.mar...@trilliumsoftware.com> wrote:

>  Hi,****
>
> ** **
>
> I am still having problems building my index.****
>
> In an attempt to find someone who can help me****
>
> I’ll go through all the steps that I try.****
>
> ** **
>
> **1)      **First I load my data into hive.****
>
> ** **
>
> hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;****
>
> Loading data to table default.score****
>
> Deleted hdfs://localhost/data/warehouse/score****
>
> OK****
>
> Time taken: 7.817 seconds****
>
> ** **
>
> **2)      **Then I try to create the index ****
>
> ** **
>
> hive> CREATE INDEX bigIndex****
>
>     > ON TABLE score(Ath_Seq_Num)****
>
>     > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';***
> *
>
> FAILED: Error in metadata: java.lang.RuntimeException: Please specify
> deferred rebuild using " WITH DEFERRED REBUILD ".****
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask****
>
> hive> ****
>
> ** **
>
> **3)      **OK, so it suggests that I use “DEFERRED BUILD” and so I do****
>
> hive> ****
>
>     > ****
>
>     > CREATE INDEX bigIndex****
>
>     > ON TABLE score(Ath_Seq_Num)****
>
>     > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'****
>
>     > WITH DEFERRED REBUILD;****
>
> OK****
>
> Time taken: 0.603 seconds****
>
> ** **
>
> **4)      **Now, to create the index I assume that I use ALTER INDEX as
> follows:****
>
> ** **
>
> hive>ALTER INDEX bigIndex ON score REBUILD;****
>
> Total MapReduce jobs = 1****
>
> Launching Job 1 out of 1****
>
> Number of reduce tasks not specified. Estimated from input data size: 138*
> ***
>
> In order to change the average load for a reducer (in bytes):****
>
>   set hive.exec.reducers.bytes.per.reducer=<number>****
>
> In order to limit the maximum number of reducers:****
>
>   set hive.exec.reducers.max=<number>****
>
> In order to set a constant number of reducers:****
>
>   set mapred.reduce.tasks=<number>****
>
> Starting Job = job_201210311448_0001, Tracking URL =
> http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001****
>
> Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job
> -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001****
>
> Hadoop job information for Stage-1: number of mappers: 511; number of
> reducers: 138****
>
> 2012-10-31 15:59:27,076 Stage-1 map = 0%,  reduce = 0%****
>
> ** **
>
> **5)      **This all looks promising, and after increasing my heapsize to
> get the Map/Reduce to complete, I get this an hour later****
>
> ** **
>
> 2012-10-31 17:08:23,572 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 4135.47 sec****
>
> MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds
> 470 msec****
>
> Ended Job = job_201210311448_0001****
>
> Loading data to table default.default__score_bigindex__****
>
> Deleted hdfs://localhost/data/warehouse/default__score_bigindex__****
>
> Invalid alter operation: Unable to alter index.****
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask****
>
> ** **
>
> So what have I done wrong, and what am I to do to get this index to build
> successfully?****
>
> ** **
>
> Any help appreciated.****
>
> ** **
>
> Peter Marron****
>
> ** **
>
> *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
> *Sent:* 24 October 2012 13:27
> *To:* user@hive.apache.org
> *Subject:* RE: Indexes****
>
> ** **
>
> Hi Shreepadma,****
>
> ** **
>
> Thanks for this. Looks exactly like the information I need.****
>
> I was going to reply when I had tried it all out, but I’m having****
>
> problems creating the index at the moment (I’m getting an****
>
> OutOfMemoryError at the moment). So I thought that I had****
>
> better reply now to say thank you.****
>
> ** **
>
> Peter Marron****
>
> ** **
>
> ** **
>
> *From:* Shreepadma Venugopalan 
> [mailto:shreepa...@cloudera.com<shreepa...@cloudera.com>]
>
> *Sent:* 23 October 2012 19:49
> *To:* user@hive.apache.org
> *Subject:* Re: Indexes****
>
> ** **
>
> Hi Peter,****
>
> ** **
>
> Indexing support was added to Hive in 0.7 and in 0.8 the query compiler
> was enhanced to optimized some class of queries (certain group bys and
> joins) using indexes. Assuming you are using the built in index handler you
> need to do the following _after_ you have created and rebuilt the index,**
> **
>
> ** **
>
> SET hive.index.compact.file='/tmp/index_result';****
>
> SET
> hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
> ****
>
> ** **
>
> You will then notice speed up for a query of the form,****
>
> ** **
>
> select count(*) from tab where indexed_col = some_val****
>
> ** **
>
> Thanks,****
>
> Shreepadma****
>
> ** **
>
> On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron <
> peter.mar...@trilliumsoftware.com> wrote:****
>
> Hi,****
>
>  ****
>
> I’m very much a Hive newbie but I’ve been looking at HIVE-417 and this
> page in particular:****
>
> http://cwiki.apache.org/confluence/display/Hive/IndexDev****
>
> Using this information I’ve been able to create an index (using Hive 0.8.1)
> ****
>
> and when I look at the contents it all looks very promising indeed.****
>
> However on the same page there’s this comment:****
>
>  ****
>
> “…This document currently only covers index creation and maintenance. A
> follow-on will explain how indexes are used to optimize queries (building
> on 
> FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev>
> )….”****
>
>  ****
>
> However I can’t find the “follow-on” which tells me how to exploit the
> index that I’ve****
>
> created to “optimize” subsequent queries.****
>
> Now I’ve been told that I can create and use indexes with the current****
>
> release of Hive _*without*_ writing and developing any Java code of my
> own.****
>
> Is this true? If so, how?****
>
>  ****
>
> Any help appreciated.****
>
>  ****
>
> Peter Marron.****
>
> ** **
>

Re: Creating Indexes

Reply via email to