That's what I would expect, too, but it appears a reducer task wanted to update some SQL statistics and to do that it wanted to load the derby jar.
On Thu, Nov 1, 2012 at 8:09 AM, Bejoy KS <bejoy...@yahoo.com> wrote: > ** > AFAIK you don't any hive jars on cluster. The hive jars are just required > on the client node . > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ------------------------------ > *From: * Dean Wampler <dean.wamp...@thinkbiganalytics.com> > *Date: *Thu, 1 Nov 2012 08:01:51 -0500 > *To: *<user@hive.apache.org> > *ReplyTo: * user@hive.apache.org > *Subject: *Re: Creating Indexes > > It looks like you're using Derby with a real cluster, not just a single > machine in local or pseudo-distributed mode. I haven't tried this myself, > but the derby jar is probably not on the machine that ran the reducer task > that failed. > > dean > > On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron < > peter.mar...@trilliumsoftware.com> wrote: > >> Hi Shreepadma,**** >> >> ** ** >> >> I agree that the error looks odd. However I can’t believe that I would >> have**** >> >> got this far with Hive if there was no derby jar. Nevertheless I checked. >> **** >> >> Here is a directory listing of the Hive install:**** >> >> ** ** >> >> pmarron@pmarron-ubuntu:/data/hive/lib$ ls**** >> >> ant-contrib-1.0b3.jar commons-pool-1.5.4.jar >> hive-common-0.8.1.jar hive-shims-0.8.1.jar mockito-all-1.8.2.jar >> **** >> >> antlr-2.7.7.jar datanucleus-connectionpool-2.0.3.jar >> hive-contrib-0.8.1.jar javaewah-0.3.jar php**** >> >> antlr-3.0.1.jar datanucleus-core-2.0.3.jar >> hive_contrib.jar jdo2-api-2.3-ec.jar py**** >> >> antlr-runtime-3.0.1.jar datanucleus-enhancer-2.0.3.jar >> hive-exec-0.8.1.jar jline-0.9.94.jar slf4j-api-1.6.1.jar** >> ** >> >> asm-3.1.jar datanucleus-rdbms-2.0.3.jar >> hive-hbase-handler-0.8.1.jar json-20090211.jar slf4j-log4j12-1.6.1.jar >> **** >> >> commons-cli-1.2.jar *derby-10.4.2.0.jar* >> hive-hwi-0.8.1.jar junit-4.10.jar >> stringtemplate-3.1-b1.jar**** >> >> commons-codec-1.3.jar guava-r06.jar >> hive-hwi-0.8.1.war libfb303-0.7.0.jar velocity-1.5.jar**** >> >> commons-collections-3.2.1.jar hbase-0.89.0-SNAPSHOT.jar >> hive-jdbc-0.8.1.jar libfb303.jar >> zookeeper-3.3.1.jar**** >> >> commons-dbcp-1.4.jar hbase-0.89.0-SNAPSHOT-tests.jar >> hive-metastore-0.8.1.jar libthrift-0.7.0.jar**** >> >> commons-lang-2.4.jar hive-anttasks-0.8.1.jar >> hive-pdk-0.8.1.jar libthrift.jar**** >> >> commons-logging-1.0.4.jar hive-builtins-0.8.1.jar >> hive-serde-0.8.1.jar log4j-1.2.15.jar**** >> >> commons-logging-api-1.0.4.jar hive-cli-0.8.1.jar >> hive-service-0.8.1.jar log4j-1.2.16.jar**** >> >> ** ** >> >> Also I found a derby.log in my home directory which I have attached.**** >> >> ** ** >> >> Regards,**** >> >> ** ** >> >> Z**** >> >> ** ** >> >> *From:* Shreepadma Venugopalan [mailto:shreepa...@cloudera.com] >> *Sent:* 31 October 2012 21:58 >> >> *To:* user@hive.apache.org >> *Subject:* Re: Creating Indexes**** >> >> ** ** >> >> Hi Peter,**** >> >> ** ** >> >> From the execution log,**** >> >> ** ** >> >> java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver*** >> * >> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)**** >> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)**** >> >> at java.security.AccessController.doPrivileged(Native Method)** >> ** >> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)** >> ** >> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)**** >> >> at >> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)**** >> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)**** >> >> at java.lang.Class.forName0(Native Method)**** >> >> at java.lang.Class.forName(Class.java:186)**** >> >> at >> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68) >> **** >> >> at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:778) >> **** >> >> at >> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:723) >> **** >> >> at >> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)**** >> >> at >> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)**** >> >> at >> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)**** >> >> at >> org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)*** >> * >> >> at >> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)*** >> * >> >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) >> **** >> >> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)**** >> >> at java.security.AccessController.doPrivileged(Native Method)** >> ** >> >> at javax.security.auth.Subject.doAs(Subject.java:415)**** >> >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) >> **** >> >> at org.apache.hadoop.mapred.Child.main(Child.java:249)**** >> >> ** ** >> >> It appears that the error is due derby classes not being found. Can you >> check if the derby jars are present?**** >> >> ** ** >> >> Thanks,**** >> >> Shreepadma**** >> >> ** ** >> >> ** ** >> >> On Wed, Oct 31, 2012 at 12:52 PM, Peter Marron < >> peter.mar...@trilliumsoftware.com> wrote:**** >> >> Hi Shreepadma**** >> >> **** >> >> Happy to attach the logs, not quite sure which one is going to**** >> >> be most useful. Please find attached one which contained an**** >> >> error of some sort. Not sure it it’s related or not to the index error.** >> ** >> >> Found the file in this location:**** >> >> **** >> >> >> /data/hadoop/logs/userlogs/job_201210311448_0001/attempt_201210311448_0001_r_000137_0/syslog >> **** >> >> **** >> >> so maybe that will help you locate any other file that you might want to >> see.**** >> >> **** >> >> Thanks for your efforts.**** >> >> **** >> >> Peter Marron**** >> >> **** >> >> *From:* Shreepadma Venugopalan [mailto:shreepa...@cloudera.com] >> *Sent:* 31 October 2012 18:38 >> *To:* user@hive.apache.org >> *Subject:* Re: Creating Indexes**** >> >> **** >> >> Hi Peter,**** >> >> **** >> >> Can you attach the execution logs? What is the exception that you see in >> the execution logs?**** >> >> **** >> >> Thanks,**** >> >> Shreepadma **** >> >> **** >> >> On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron < >> peter.mar...@trilliumsoftware.com> wrote:**** >> >> Hi,**** >> >> **** >> >> I am still having problems building my index.**** >> >> In an attempt to find someone who can help me**** >> >> I’ll go through all the steps that I try.**** >> >> **** >> >> 1) First I load my data into hive.**** >> >> **** >> >> hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;**** >> >> Loading data to table default.score**** >> >> Deleted hdfs://localhost/data/warehouse/score**** >> >> OK**** >> >> Time taken: 7.817 seconds**** >> >> **** >> >> 2) Then I try to create the index **** >> >> **** >> >> hive> CREATE INDEX bigIndex**** >> >> > ON TABLE score(Ath_Seq_Num)**** >> >> > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';** >> ** >> >> FAILED: Error in metadata: java.lang.RuntimeException: Please specify >> deferred rebuild using " WITH DEFERRED REBUILD ".**** >> >> FAILED: Execution Error, return code 1 from >> org.apache.hadoop.hive.ql.exec.DDLTask**** >> >> hive> **** >> >> **** >> >> 3) OK, so it suggests that I use “DEFERRED BUILD” and so I do**** >> >> hive> **** >> >> > **** >> >> > CREATE INDEX bigIndex**** >> >> > ON TABLE score(Ath_Seq_Num)**** >> >> > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'*** >> * >> >> > WITH DEFERRED REBUILD;**** >> >> OK**** >> >> Time taken: 0.603 seconds**** >> >> **** >> >> 4) Now, to create the index I assume that I use ALTER INDEX as >> follows:**** >> >> **** >> >> hive>ALTER INDEX bigIndex ON score REBUILD;**** >> >> Total MapReduce jobs = 1**** >> >> Launching Job 1 out of 1**** >> >> Number of reduce tasks not specified. Estimated from input data size: 138 >> **** >> >> In order to change the average load for a reducer (in bytes):**** >> >> set hive.exec.reducers.bytes.per.reducer=<number>**** >> >> In order to limit the maximum number of reducers:**** >> >> set hive.exec.reducers.max=<number>**** >> >> In order to set a constant number of reducers:**** >> >> set mapred.reduce.tasks=<number>**** >> >> Starting Job = job_201210311448_0001, Tracking URL = >> http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001**** >> >> Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job >> -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001**** >> >> Hadoop job information for Stage-1: number of mappers: 511; number of >> reducers: 138**** >> >> 2012-10-31 15:59:27,076 Stage-1 map = 0%, reduce = 0%**** >> >> **** >> >> 5) This all looks promising, and after increasing my heapsize to >> get the Map/Reduce to complete, I get this an hour later**** >> >> **** >> >> 2012-10-31 17:08:23,572 Stage-1 map = 100%, reduce = 100%, Cumulative >> CPU 4135.47 sec**** >> >> MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds >> 470 msec**** >> >> Ended Job = job_201210311448_0001**** >> >> Loading data to table default.default__score_bigindex__**** >> >> Deleted hdfs://localhost/data/warehouse/default__score_bigindex__**** >> >> Invalid alter operation: Unable to alter index.**** >> >> FAILED: Execution Error, return code 1 from >> org.apache.hadoop.hive.ql.exec.DDLTask**** >> >> **** >> >> So what have I done wrong, and what am I to do to get this index to build >> successfully?**** >> >> **** >> >> Any help appreciated.**** >> >> **** >> >> Peter Marron**** >> >> **** >> >> *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] >> *Sent:* 24 October 2012 13:27 >> *To:* user@hive.apache.org >> *Subject:* RE: Indexes**** >> >> **** >> >> Hi Shreepadma,**** >> >> **** >> >> Thanks for this. Looks exactly like the information I need.**** >> >> I was going to reply when I had tried it all out, but I’m having**** >> >> problems creating the index at the moment (I’m getting an**** >> >> OutOfMemoryError at the moment). So I thought that I had**** >> >> better reply now to say thank you.**** >> >> **** >> >> Peter Marron**** >> >> **** >> >> **** >> >> *From:* Shreepadma Venugopalan >> [mailto:shreepa...@cloudera.com<shreepa...@cloudera.com>] >> >> *Sent:* 23 October 2012 19:49 >> *To:* user@hive.apache.org >> *Subject:* Re: Indexes**** >> >> **** >> >> Hi Peter,**** >> >> **** >> >> Indexing support was added to Hive in 0.7 and in 0.8 the query compiler >> was enhanced to optimized some class of queries (certain group bys and >> joins) using indexes. Assuming you are using the built in index handler you >> need to do the following _after_ you have created and rebuilt the index,* >> *** >> >> **** >> >> SET hive.index.compact.file='/tmp/index_result';**** >> >> SET >> hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; >> **** >> >> **** >> >> You will then notice speed up for a query of the form,**** >> >> **** >> >> select count(*) from tab where indexed_col = some_val**** >> >> **** >> >> Thanks,**** >> >> Shreepadma**** >> >> **** >> >> On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron < >> peter.mar...@trilliumsoftware.com> wrote:**** >> >> Hi,**** >> >> **** >> >> I’m very much a Hive newbie but I’ve been looking at HIVE-417 and this >> page in particular:**** >> >> http://cwiki.apache.org/confluence/display/Hive/IndexDev**** >> >> Using this information I’ve been able to create an index (using Hive >> 0.8.1)**** >> >> and when I look at the contents it all looks very promising indeed.**** >> >> However on the same page there’s this comment:**** >> >> **** >> >> “…This document currently only covers index creation and maintenance. A >> follow-on will explain how indexes are used to optimize queries (building >> on >> FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev> >> )….”**** >> >> **** >> >> However I can’t find the “follow-on” which tells me how to exploit the >> index that I’ve**** >> >> created to “optimize” subsequent queries.**** >> >> Now I’ve been told that I can create and use indexes with the current**** >> >> release of Hive _*without*_ writing and developing any Java code of my >> own.**** >> >> Is this true? If so, how?**** >> >> **** >> >> Any help appreciated.**** >> >> **** >> >> Peter Marron.**** >> >> **** >> >> **** >> >> ** ** >> > > > > -- > *Dean Wampler, Ph.D.* > thinkbiganalytics.com > +1-312-339-1330 > > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330