Hi Peter, >From the execution log,
java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:778) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:723) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) It appears that the error is due derby classes not being found. Can you check if the derby jars are present? Thanks, Shreepadma On Wed, Oct 31, 2012 at 12:52 PM, Peter Marron < peter.mar...@trilliumsoftware.com> wrote: > Hi Shreepadma**** > > ** ** > > Happy to attach the logs, not quite sure which one is going to**** > > be most useful. Please find attached one which contained an**** > > error of some sort. Not sure it it’s related or not to the index error.*** > * > > Found the file in this location:**** > > ** ** > > > /data/hadoop/logs/userlogs/job_201210311448_0001/attempt_201210311448_0001_r_000137_0/syslog > **** > > ** ** > > so maybe that will help you locate any other file that you might want to > see.**** > > ** ** > > Thanks for your efforts.**** > > ** ** > > Peter Marron**** > > ** ** > > *From:* Shreepadma Venugopalan [mailto:shreepa...@cloudera.com] > *Sent:* 31 October 2012 18:38 > *To:* user@hive.apache.org > *Subject:* Re: Creating Indexes**** > > ** ** > > Hi Peter,**** > > ** ** > > Can you attach the execution logs? What is the exception that you see in > the execution logs?**** > > ** ** > > Thanks,**** > > Shreepadma **** > > ** ** > > On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron < > peter.mar...@trilliumsoftware.com> wrote:**** > > Hi,**** > > **** > > I am still having problems building my index.**** > > In an attempt to find someone who can help me**** > > I’ll go through all the steps that I try.**** > > **** > > 1) First I load my data into hive.**** > > **** > > hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;**** > > Loading data to table default.score**** > > Deleted hdfs://localhost/data/warehouse/score**** > > OK**** > > Time taken: 7.817 seconds**** > > **** > > 2) Then I try to create the index **** > > **** > > hive> CREATE INDEX bigIndex**** > > > ON TABLE score(Ath_Seq_Num)**** > > > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';*** > * > > FAILED: Error in metadata: java.lang.RuntimeException: Please specify > deferred rebuild using " WITH DEFERRED REBUILD ".**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > hive> **** > > **** > > 3) OK, so it suggests that I use “DEFERRED BUILD” and so I do**** > > hive> **** > > > **** > > > CREATE INDEX bigIndex**** > > > ON TABLE score(Ath_Seq_Num)**** > > > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'**** > > > WITH DEFERRED REBUILD;**** > > OK**** > > Time taken: 0.603 seconds**** > > **** > > 4) Now, to create the index I assume that I use ALTER INDEX as > follows:**** > > **** > > hive>ALTER INDEX bigIndex ON score REBUILD;**** > > Total MapReduce jobs = 1**** > > Launching Job 1 out of 1**** > > Number of reduce tasks not specified. Estimated from input data size: 138* > *** > > In order to change the average load for a reducer (in bytes):**** > > set hive.exec.reducers.bytes.per.reducer=<number>**** > > In order to limit the maximum number of reducers:**** > > set hive.exec.reducers.max=<number>**** > > In order to set a constant number of reducers:**** > > set mapred.reduce.tasks=<number>**** > > Starting Job = job_201210311448_0001, Tracking URL = > http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001**** > > Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job > -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001**** > > Hadoop job information for Stage-1: number of mappers: 511; number of > reducers: 138**** > > 2012-10-31 15:59:27,076 Stage-1 map = 0%, reduce = 0%**** > > **** > > 5) This all looks promising, and after increasing my heapsize to get > the Map/Reduce to complete, I get this an hour later**** > > **** > > 2012-10-31 17:08:23,572 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 4135.47 sec**** > > MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds > 470 msec**** > > Ended Job = job_201210311448_0001**** > > Loading data to table default.default__score_bigindex__**** > > Deleted hdfs://localhost/data/warehouse/default__score_bigindex__**** > > Invalid alter operation: Unable to alter index.**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > **** > > So what have I done wrong, and what am I to do to get this index to build > successfully?**** > > **** > > Any help appreciated.**** > > **** > > Peter Marron**** > > **** > > *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] > *Sent:* 24 October 2012 13:27 > *To:* user@hive.apache.org > *Subject:* RE: Indexes**** > > **** > > Hi Shreepadma,**** > > **** > > Thanks for this. Looks exactly like the information I need.**** > > I was going to reply when I had tried it all out, but I’m having**** > > problems creating the index at the moment (I’m getting an**** > > OutOfMemoryError at the moment). So I thought that I had**** > > better reply now to say thank you.**** > > **** > > Peter Marron**** > > **** > > **** > > *From:* Shreepadma Venugopalan > [mailto:shreepa...@cloudera.com<shreepa...@cloudera.com>] > > *Sent:* 23 October 2012 19:49 > *To:* user@hive.apache.org > *Subject:* Re: Indexes**** > > **** > > Hi Peter,**** > > **** > > Indexing support was added to Hive in 0.7 and in 0.8 the query compiler > was enhanced to optimized some class of queries (certain group bys and > joins) using indexes. Assuming you are using the built in index handler you > need to do the following _after_ you have created and rebuilt the index,** > ** > > **** > > SET hive.index.compact.file='/tmp/index_result';**** > > SET > hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; > **** > > **** > > You will then notice speed up for a query of the form,**** > > **** > > select count(*) from tab where indexed_col = some_val**** > > **** > > Thanks,**** > > Shreepadma**** > > **** > > On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron < > peter.mar...@trilliumsoftware.com> wrote:**** > > Hi,**** > > **** > > I’m very much a Hive newbie but I’ve been looking at HIVE-417 and this > page in particular:**** > > http://cwiki.apache.org/confluence/display/Hive/IndexDev**** > > Using this information I’ve been able to create an index (using Hive 0.8.1) > **** > > and when I look at the contents it all looks very promising indeed.**** > > However on the same page there’s this comment:**** > > **** > > “…This document currently only covers index creation and maintenance. A > follow-on will explain how indexes are used to optimize queries (building > on > FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev> > )….”**** > > **** > > However I can’t find the “follow-on” which tells me how to exploit the > index that I’ve**** > > created to “optimize” subsequent queries.**** > > Now I’ve been told that I can create and use indexes with the current**** > > release of Hive _*without*_ writing and developing any Java code of my > own.**** > > Is this true? If so, how?**** > > **** > > Any help appreciated.**** > > **** > > Peter Marron.**** > > **** > > ** ** >