Hi Peter, Can you attach the execution logs? What is the exception that you see in the execution logs?
Thanks, Shreepadma On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron < peter.mar...@trilliumsoftware.com> wrote: > Hi,**** > > ** ** > > I am still having problems building my index.**** > > In an attempt to find someone who can help me**** > > I’ll go through all the steps that I try.**** > > ** ** > > **1) **First I load my data into hive.**** > > ** ** > > hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;**** > > Loading data to table default.score**** > > Deleted hdfs://localhost/data/warehouse/score**** > > OK**** > > Time taken: 7.817 seconds**** > > ** ** > > **2) **Then I try to create the index **** > > ** ** > > hive> CREATE INDEX bigIndex**** > > > ON TABLE score(Ath_Seq_Num)**** > > > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';*** > * > > FAILED: Error in metadata: java.lang.RuntimeException: Please specify > deferred rebuild using " WITH DEFERRED REBUILD ".**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > hive> **** > > ** ** > > **3) **OK, so it suggests that I use “DEFERRED BUILD” and so I do**** > > hive> **** > > > **** > > > CREATE INDEX bigIndex**** > > > ON TABLE score(Ath_Seq_Num)**** > > > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'**** > > > WITH DEFERRED REBUILD;**** > > OK**** > > Time taken: 0.603 seconds**** > > ** ** > > **4) **Now, to create the index I assume that I use ALTER INDEX as > follows:**** > > ** ** > > hive>ALTER INDEX bigIndex ON score REBUILD;**** > > Total MapReduce jobs = 1**** > > Launching Job 1 out of 1**** > > Number of reduce tasks not specified. Estimated from input data size: 138* > *** > > In order to change the average load for a reducer (in bytes):**** > > set hive.exec.reducers.bytes.per.reducer=<number>**** > > In order to limit the maximum number of reducers:**** > > set hive.exec.reducers.max=<number>**** > > In order to set a constant number of reducers:**** > > set mapred.reduce.tasks=<number>**** > > Starting Job = job_201210311448_0001, Tracking URL = > http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001**** > > Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job > -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001**** > > Hadoop job information for Stage-1: number of mappers: 511; number of > reducers: 138**** > > 2012-10-31 15:59:27,076 Stage-1 map = 0%, reduce = 0%**** > > ** ** > > **5) **This all looks promising, and after increasing my heapsize to > get the Map/Reduce to complete, I get this an hour later**** > > ** ** > > 2012-10-31 17:08:23,572 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 4135.47 sec**** > > MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds > 470 msec**** > > Ended Job = job_201210311448_0001**** > > Loading data to table default.default__score_bigindex__**** > > Deleted hdfs://localhost/data/warehouse/default__score_bigindex__**** > > Invalid alter operation: Unable to alter index.**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > ** ** > > So what have I done wrong, and what am I to do to get this index to build > successfully?**** > > ** ** > > Any help appreciated.**** > > ** ** > > Peter Marron**** > > ** ** > > *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] > *Sent:* 24 October 2012 13:27 > *To:* user@hive.apache.org > *Subject:* RE: Indexes**** > > ** ** > > Hi Shreepadma,**** > > ** ** > > Thanks for this. Looks exactly like the information I need.**** > > I was going to reply when I had tried it all out, but I’m having**** > > problems creating the index at the moment (I’m getting an**** > > OutOfMemoryError at the moment). So I thought that I had**** > > better reply now to say thank you.**** > > ** ** > > Peter Marron**** > > ** ** > > ** ** > > *From:* Shreepadma Venugopalan > [mailto:shreepa...@cloudera.com<shreepa...@cloudera.com>] > > *Sent:* 23 October 2012 19:49 > *To:* user@hive.apache.org > *Subject:* Re: Indexes**** > > ** ** > > Hi Peter,**** > > ** ** > > Indexing support was added to Hive in 0.7 and in 0.8 the query compiler > was enhanced to optimized some class of queries (certain group bys and > joins) using indexes. Assuming you are using the built in index handler you > need to do the following _after_ you have created and rebuilt the index,** > ** > > ** ** > > SET hive.index.compact.file='/tmp/index_result';**** > > SET > hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; > **** > > ** ** > > You will then notice speed up for a query of the form,**** > > ** ** > > select count(*) from tab where indexed_col = some_val**** > > ** ** > > Thanks,**** > > Shreepadma**** > > ** ** > > On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron < > peter.mar...@trilliumsoftware.com> wrote:**** > > Hi,**** > > **** > > I’m very much a Hive newbie but I’ve been looking at HIVE-417 and this > page in particular:**** > > http://cwiki.apache.org/confluence/display/Hive/IndexDev**** > > Using this information I’ve been able to create an index (using Hive 0.8.1) > **** > > and when I look at the contents it all looks very promising indeed.**** > > However on the same page there’s this comment:**** > > **** > > “…This document currently only covers index creation and maintenance. A > follow-on will explain how indexes are used to optimize queries (building > on > FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev> > )….”**** > > **** > > However I can’t find the “follow-on” which tells me how to exploit the > index that I’ve**** > > created to “optimize” subsequent queries.**** > > Now I’ve been told that I can create and use indexes with the current**** > > release of Hive _*without*_ writing and developing any Java code of my > own.**** > > Is this true? If so, how?**** > > **** > > Any help appreciated.**** > > **** > > Peter Marron.**** > > ** ** >