Re: Creating Indexes

Shreepadma Venugopalan Wed, 31 Oct 2012 14:58:53 -0700

Hi Peter,

>From the execution log,


java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at
org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:778)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:723)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)
at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

It appears that the error is due derby classes not being found. Can you
check if the derby jars are present?

Thanks,
Shreepadma


On Wed, Oct 31, 2012 at 12:52 PM, Peter Marron <
peter.mar...@trilliumsoftware.com> wrote:

>  Hi Shreepadma****
>
> ** **
>
> Happy to attach the logs, not quite sure which one is going to****
>
> be most useful. Please find attached one which contained an****
>
> error of some sort. Not sure it it’s related or not to the index error.***
> *
>
> Found the file in this location:****
>
> ** **
>
>
> /data/hadoop/logs/userlogs/job_201210311448_0001/attempt_201210311448_0001_r_000137_0/syslog
> ****
>
> ** **
>
> so maybe that will help you locate any other file that you might want to
> see.****
>
> ** **
>
> Thanks for your efforts.****
>
> ** **
>
> Peter Marron****
>
> ** **
>
> *From:* Shreepadma Venugopalan [mailto:shreepa...@cloudera.com]
> *Sent:* 31 October 2012 18:38
> *To:* user@hive.apache.org
> *Subject:* Re: Creating Indexes****
>
> ** **
>
> Hi Peter,****
>
> ** **
>
> Can you attach the execution logs? What is the exception that you see in
> the execution logs?****
>
> ** **
>
> Thanks,****
>
> Shreepadma ****
>
> ** **
>
> On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron <
> peter.mar...@trilliumsoftware.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am still having problems building my index.****
>
> In an attempt to find someone who can help me****
>
> I’ll go through all the steps that I try.****
>
>  ****
>
> 1)      First I load my data into hive.****
>
>  ****
>
> hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;****
>
> Loading data to table default.score****
>
> Deleted hdfs://localhost/data/warehouse/score****
>
> OK****
>
> Time taken: 7.817 seconds****
>
>  ****
>
> 2)      Then I try to create the index ****
>
>  ****
>
> hive> CREATE INDEX bigIndex****
>
>     > ON TABLE score(Ath_Seq_Num)****
>
>     > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';***
> *
>
> FAILED: Error in metadata: java.lang.RuntimeException: Please specify
> deferred rebuild using " WITH DEFERRED REBUILD ".****
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask****
>
> hive> ****
>
>  ****
>
> 3)      OK, so it suggests that I use “DEFERRED BUILD” and so I do****
>
> hive> ****
>
>     > ****
>
>     > CREATE INDEX bigIndex****
>
>     > ON TABLE score(Ath_Seq_Num)****
>
>     > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'****
>
>     > WITH DEFERRED REBUILD;****
>
> OK****
>
> Time taken: 0.603 seconds****
>
>  ****
>
> 4)      Now, to create the index I assume that I use ALTER INDEX as
> follows:****
>
>  ****
>
> hive>ALTER INDEX bigIndex ON score REBUILD;****
>
> Total MapReduce jobs = 1****
>
> Launching Job 1 out of 1****
>
> Number of reduce tasks not specified. Estimated from input data size: 138*
> ***
>
> In order to change the average load for a reducer (in bytes):****
>
>   set hive.exec.reducers.bytes.per.reducer=<number>****
>
> In order to limit the maximum number of reducers:****
>
>   set hive.exec.reducers.max=<number>****
>
> In order to set a constant number of reducers:****
>
>   set mapred.reduce.tasks=<number>****
>
> Starting Job = job_201210311448_0001, Tracking URL =
> http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001****
>
> Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job
> -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001****
>
> Hadoop job information for Stage-1: number of mappers: 511; number of
> reducers: 138****
>
> 2012-10-31 15:59:27,076 Stage-1 map = 0%,  reduce = 0%****
>
>  ****
>
> 5)      This all looks promising, and after increasing my heapsize to get
> the Map/Reduce to complete, I get this an hour later****
>
>  ****
>
> 2012-10-31 17:08:23,572 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 4135.47 sec****
>
> MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds
> 470 msec****
>
> Ended Job = job_201210311448_0001****
>
> Loading data to table default.default__score_bigindex__****
>
> Deleted hdfs://localhost/data/warehouse/default__score_bigindex__****
>
> Invalid alter operation: Unable to alter index.****
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask****
>
>  ****
>
> So what have I done wrong, and what am I to do to get this index to build
> successfully?****
>
>  ****
>
> Any help appreciated.****
>
>  ****
>
> Peter Marron****
>
>  ****
>
> *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
> *Sent:* 24 October 2012 13:27
> *To:* user@hive.apache.org
> *Subject:* RE: Indexes****
>
>  ****
>
> Hi Shreepadma,****
>
>  ****
>
> Thanks for this. Looks exactly like the information I need.****
>
> I was going to reply when I had tried it all out, but I’m having****
>
> problems creating the index at the moment (I’m getting an****
>
> OutOfMemoryError at the moment). So I thought that I had****
>
> better reply now to say thank you.****
>
>  ****
>
> Peter Marron****
>
>  ****
>
>  ****
>
> *From:* Shreepadma Venugopalan 
> [mailto:shreepa...@cloudera.com<shreepa...@cloudera.com>]
>
> *Sent:* 23 October 2012 19:49
> *To:* user@hive.apache.org
> *Subject:* Re: Indexes****
>
>  ****
>
> Hi Peter,****
>
>  ****
>
> Indexing support was added to Hive in 0.7 and in 0.8 the query compiler
> was enhanced to optimized some class of queries (certain group bys and
> joins) using indexes. Assuming you are using the built in index handler you
> need to do the following _after_ you have created and rebuilt the index,**
> **
>
>  ****
>
> SET hive.index.compact.file='/tmp/index_result';****
>
> SET
> hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
> ****
>
>  ****
>
> You will then notice speed up for a query of the form,****
>
>  ****
>
> select count(*) from tab where indexed_col = some_val****
>
>  ****
>
> Thanks,****
>
> Shreepadma****
>
>  ****
>
> On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron <
> peter.mar...@trilliumsoftware.com> wrote:****
>
> Hi,****
>
>  ****
>
> I’m very much a Hive newbie but I’ve been looking at HIVE-417 and this
> page in particular:****
>
> http://cwiki.apache.org/confluence/display/Hive/IndexDev****
>
> Using this information I’ve been able to create an index (using Hive 0.8.1)
> ****
>
> and when I look at the contents it all looks very promising indeed.****
>
> However on the same page there’s this comment:****
>
>  ****
>
> “…This document currently only covers index creation and maintenance. A
> follow-on will explain how indexes are used to optimize queries (building
> on 
> FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev>
> )….”****
>
>  ****
>
> However I can’t find the “follow-on” which tells me how to exploit the
> index that I’ve****
>
> created to “optimize” subsequent queries.****
>
> Now I’ve been told that I can create and use indexes with the current****
>
> release of Hive _*without*_ writing and developing any Java code of my
> own.****
>
> Is this true? If so, how?****
>
>  ****
>
> Any help appreciated.****
>
>  ****
>
> Peter Marron.****
>
>  ****
>
> ** **
>

Re: Creating Indexes

Reply via email to