Re: Creating Indexes

Dean Wampler Thu, 01 Nov 2012 06:21:04 -0700

That's what I would expect, too, but it appears a reducer task wanted to
update some SQL statistics and to do that it wanted to load the derby jar.


On Thu, Nov 1, 2012 at 8:09 AM, Bejoy KS <bejoy...@yahoo.com> wrote:

> **
> AFAIK you don't any hive jars on cluster. The hive jars are just required
> on the client node .
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * Dean Wampler <dean.wamp...@thinkbiganalytics.com>
> *Date: *Thu, 1 Nov 2012 08:01:51 -0500
> *To: *<user@hive.apache.org>
> *ReplyTo: * user@hive.apache.org
> *Subject: *Re: Creating Indexes
>
> It looks like you're using Derby with a real cluster, not just a single
> machine in local or pseudo-distributed mode. I haven't tried this myself,
> but the derby jar is probably not on the machine that ran the reducer task
> that failed.
>
> dean
>
> On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron <
> peter.mar...@trilliumsoftware.com> wrote:
>
>>  Hi Shreepadma,****
>>
>> ** **
>>
>> I agree that the error looks odd. However I can’t believe that I would
>> have****
>>
>> got this far with Hive if there was no derby jar. Nevertheless I checked.
>> ****
>>
>> Here is a directory listing of the Hive install:****
>>
>> ** **
>>
>> pmarron@pmarron-ubuntu:/data/hive/lib$ ls****
>>
>> ant-contrib-1.0b3.jar          commons-pool-1.5.4.jar
>> hive-common-0.8.1.jar         hive-shims-0.8.1.jar  mockito-all-1.8.2.jar
>> ****
>>
>> antlr-2.7.7.jar                datanucleus-connectionpool-2.0.3.jar
>> hive-contrib-0.8.1.jar        javaewah-0.3.jar      php****
>>
>> antlr-3.0.1.jar                datanucleus-core-2.0.3.jar
>> hive_contrib.jar              jdo2-api-2.3-ec.jar   py****
>>
>> antlr-runtime-3.0.1.jar        datanucleus-enhancer-2.0.3.jar
>> hive-exec-0.8.1.jar           jline-0.9.94.jar      slf4j-api-1.6.1.jar**
>> **
>>
>> asm-3.1.jar                    datanucleus-rdbms-2.0.3.jar
>> hive-hbase-handler-0.8.1.jar  json-20090211.jar     slf4j-log4j12-1.6.1.jar
>> ****
>>
>> commons-cli-1.2.jar            *derby-10.4.2.0.jar*
>> hive-hwi-0.8.1.jar            junit-4.10.jar
>> stringtemplate-3.1-b1.jar****
>>
>> commons-codec-1.3.jar          guava-r06.jar
>> hive-hwi-0.8.1.war            libfb303-0.7.0.jar    velocity-1.5.jar****
>>
>> commons-collections-3.2.1.jar  hbase-0.89.0-SNAPSHOT.jar
>>         hive-jdbc-0.8.1.jar           libfb303.jar
>> zookeeper-3.3.1.jar****
>>
>> commons-dbcp-1.4.jar           hbase-0.89.0-SNAPSHOT-tests.jar
>> hive-metastore-0.8.1.jar      libthrift-0.7.0.jar****
>>
>> commons-lang-2.4.jar           hive-anttasks-0.8.1.jar
>>             hive-pdk-0.8.1.jar            libthrift.jar****
>>
>> commons-logging-1.0.4.jar      hive-builtins-0.8.1.jar
>> hive-serde-0.8.1.jar          log4j-1.2.15.jar****
>>
>> commons-logging-api-1.0.4.jar  hive-cli-0.8.1.jar
>> hive-service-0.8.1.jar        log4j-1.2.16.jar****
>>
>> ** **
>>
>> Also I found a derby.log in my home directory which I have attached.****
>>
>> ** **
>>
>> Regards,****
>>
>> ** **
>>
>> Z****
>>
>> ** **
>>
>> *From:* Shreepadma Venugopalan [mailto:shreepa...@cloudera.com]
>> *Sent:* 31 October 2012 21:58
>>
>> *To:* user@hive.apache.org
>> *Subject:* Re: Creating Indexes****
>>
>>  ** **
>>
>> Hi Peter,****
>>
>> ** **
>>
>> From the execution log,****
>>
>> ** **
>>
>> java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver***
>> *
>>
>>           at java.net.URLClassLoader$1.run(URLClassLoader.java:366)****
>>
>>           at java.net.URLClassLoader$1.run(URLClassLoader.java:355)****
>>
>>           at java.security.AccessController.doPrivileged(Native Method)**
>> **
>>
>>           at java.net.URLClassLoader.findClass(URLClassLoader.java:354)**
>> **
>>
>>           at java.lang.ClassLoader.loadClass(ClassLoader.java:423)****
>>
>>           at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)****
>>
>>           at java.lang.ClassLoader.loadClass(ClassLoader.java:356)****
>>
>>           at java.lang.Class.forName0(Native Method)****
>>
>>           at java.lang.Class.forName(Class.java:186)****
>>
>>           at
>> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68)
>> ****
>>
>>           at
>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:778)
>> ****
>>
>>           at
>> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:723)
>> ****
>>
>>           at
>> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)****
>>
>>           at
>> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)****
>>
>>           at
>> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)****
>>
>>           at
>> org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)***
>> *
>>
>>           at
>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)***
>> *
>>
>>           at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>> ****
>>
>>           at org.apache.hadoop.mapred.Child$4.run(Child.java:255)****
>>
>>           at java.security.AccessController.doPrivileged(Native Method)**
>> **
>>
>>           at javax.security.auth.Subject.doAs(Subject.java:415)****
>>
>>           at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>> ****
>>
>>           at org.apache.hadoop.mapred.Child.main(Child.java:249)****
>>
>> ** **
>>
>> It appears that the error is due derby classes not being found. Can you
>> check if the derby jars are present?****
>>
>> ** **
>>
>> Thanks,****
>>
>> Shreepadma****
>>
>> ** **
>>
>> ** **
>>
>> On Wed, Oct 31, 2012 at 12:52 PM, Peter Marron <
>> peter.mar...@trilliumsoftware.com> wrote:****
>>
>> Hi Shreepadma****
>>
>>  ****
>>
>> Happy to attach the logs, not quite sure which one is going to****
>>
>> be most useful. Please find attached one which contained an****
>>
>> error of some sort. Not sure it it’s related or not to the index error.**
>> **
>>
>> Found the file in this location:****
>>
>>  ****
>>
>>
>> /data/hadoop/logs/userlogs/job_201210311448_0001/attempt_201210311448_0001_r_000137_0/syslog
>> ****
>>
>>  ****
>>
>> so maybe that will help you locate any other file that you might want to
>> see.****
>>
>>  ****
>>
>> Thanks for your efforts.****
>>
>>  ****
>>
>> Peter Marron****
>>
>>  ****
>>
>> *From:* Shreepadma Venugopalan [mailto:shreepa...@cloudera.com]
>> *Sent:* 31 October 2012 18:38
>> *To:* user@hive.apache.org
>> *Subject:* Re: Creating Indexes****
>>
>>  ****
>>
>> Hi Peter,****
>>
>>  ****
>>
>> Can you attach the execution logs? What is the exception that you see in
>> the execution logs?****
>>
>>  ****
>>
>> Thanks,****
>>
>> Shreepadma ****
>>
>>  ****
>>
>> On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron <
>> peter.mar...@trilliumsoftware.com> wrote:****
>>
>> Hi,****
>>
>>  ****
>>
>> I am still having problems building my index.****
>>
>> In an attempt to find someone who can help me****
>>
>> I’ll go through all the steps that I try.****
>>
>>  ****
>>
>> 1)      First I load my data into hive.****
>>
>>  ****
>>
>> hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;****
>>
>> Loading data to table default.score****
>>
>> Deleted hdfs://localhost/data/warehouse/score****
>>
>> OK****
>>
>> Time taken: 7.817 seconds****
>>
>>  ****
>>
>> 2)      Then I try to create the index ****
>>
>>  ****
>>
>> hive> CREATE INDEX bigIndex****
>>
>>     > ON TABLE score(Ath_Seq_Num)****
>>
>>     > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';**
>> **
>>
>> FAILED: Error in metadata: java.lang.RuntimeException: Please specify
>> deferred rebuild using " WITH DEFERRED REBUILD ".****
>>
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask****
>>
>> hive> ****
>>
>>  ****
>>
>> 3)      OK, so it suggests that I use “DEFERRED BUILD” and so I do****
>>
>> hive> ****
>>
>>     > ****
>>
>>     > CREATE INDEX bigIndex****
>>
>>     > ON TABLE score(Ath_Seq_Num)****
>>
>>     > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'***
>> *
>>
>>     > WITH DEFERRED REBUILD;****
>>
>> OK****
>>
>> Time taken: 0.603 seconds****
>>
>>  ****
>>
>> 4)      Now, to create the index I assume that I use ALTER INDEX as
>> follows:****
>>
>>  ****
>>
>> hive>ALTER INDEX bigIndex ON score REBUILD;****
>>
>> Total MapReduce jobs = 1****
>>
>> Launching Job 1 out of 1****
>>
>> Number of reduce tasks not specified. Estimated from input data size: 138
>> ****
>>
>> In order to change the average load for a reducer (in bytes):****
>>
>>   set hive.exec.reducers.bytes.per.reducer=<number>****
>>
>> In order to limit the maximum number of reducers:****
>>
>>   set hive.exec.reducers.max=<number>****
>>
>> In order to set a constant number of reducers:****
>>
>>   set mapred.reduce.tasks=<number>****
>>
>> Starting Job = job_201210311448_0001, Tracking URL =
>> http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001****
>>
>> Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job
>> -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001****
>>
>> Hadoop job information for Stage-1: number of mappers: 511; number of
>> reducers: 138****
>>
>> 2012-10-31 15:59:27,076 Stage-1 map = 0%,  reduce = 0%****
>>
>>  ****
>>
>> 5)      This all looks promising, and after increasing my heapsize to
>> get the Map/Reduce to complete, I get this an hour later****
>>
>>  ****
>>
>> 2012-10-31 17:08:23,572 Stage-1 map = 100%,  reduce = 100%, Cumulative
>> CPU 4135.47 sec****
>>
>> MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds
>> 470 msec****
>>
>> Ended Job = job_201210311448_0001****
>>
>> Loading data to table default.default__score_bigindex__****
>>
>> Deleted hdfs://localhost/data/warehouse/default__score_bigindex__****
>>
>> Invalid alter operation: Unable to alter index.****
>>
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.DDLTask****
>>
>>  ****
>>
>> So what have I done wrong, and what am I to do to get this index to build
>> successfully?****
>>
>>  ****
>>
>> Any help appreciated.****
>>
>>  ****
>>
>> Peter Marron****
>>
>>  ****
>>
>> *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
>> *Sent:* 24 October 2012 13:27
>> *To:* user@hive.apache.org
>> *Subject:* RE: Indexes****
>>
>>  ****
>>
>> Hi Shreepadma,****
>>
>>  ****
>>
>> Thanks for this. Looks exactly like the information I need.****
>>
>> I was going to reply when I had tried it all out, but I’m having****
>>
>> problems creating the index at the moment (I’m getting an****
>>
>> OutOfMemoryError at the moment). So I thought that I had****
>>
>> better reply now to say thank you.****
>>
>>  ****
>>
>> Peter Marron****
>>
>>  ****
>>
>>  ****
>>
>> *From:* Shreepadma Venugopalan 
>> [mailto:shreepa...@cloudera.com<shreepa...@cloudera.com>]
>>
>> *Sent:* 23 October 2012 19:49
>> *To:* user@hive.apache.org
>> *Subject:* Re: Indexes****
>>
>>  ****
>>
>> Hi Peter,****
>>
>>  ****
>>
>> Indexing support was added to Hive in 0.7 and in 0.8 the query compiler
>> was enhanced to optimized some class of queries (certain group bys and
>> joins) using indexes. Assuming you are using the built in index handler you
>> need to do the following _after_ you have created and rebuilt the index,*
>> ***
>>
>>  ****
>>
>> SET hive.index.compact.file='/tmp/index_result';****
>>
>> SET
>> hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
>> ****
>>
>>  ****
>>
>> You will then notice speed up for a query of the form,****
>>
>>  ****
>>
>> select count(*) from tab where indexed_col = some_val****
>>
>>  ****
>>
>> Thanks,****
>>
>> Shreepadma****
>>
>>  ****
>>
>> On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron <
>> peter.mar...@trilliumsoftware.com> wrote:****
>>
>> Hi,****
>>
>>  ****
>>
>> I’m very much a Hive newbie but I’ve been looking at HIVE-417 and this
>> page in particular:****
>>
>> http://cwiki.apache.org/confluence/display/Hive/IndexDev****
>>
>> Using this information I’ve been able to create an index (using Hive
>> 0.8.1)****
>>
>> and when I look at the contents it all looks very promising indeed.****
>>
>> However on the same page there’s this comment:****
>>
>>  ****
>>
>> “…This document currently only covers index creation and maintenance. A
>> follow-on will explain how indexes are used to optimize queries (building
>> on 
>> FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev>
>> )….”****
>>
>>  ****
>>
>> However I can’t find the “follow-on” which tells me how to exploit the
>> index that I’ve****
>>
>> created to “optimize” subsequent queries.****
>>
>> Now I’ve been told that I can create and use indexes with the current****
>>
>> release of Hive _*without*_ writing and developing any Java code of my
>> own.****
>>
>> Is this true? If so, how?****
>>
>>  ****
>>
>> Any help appreciated.****
>>
>>  ****
>>
>> Peter Marron.****
>>
>>  ****
>>
>>  ****
>>
>> ** **
>>
>
>
>
> --
> *Dean Wampler, Ph.D.*
> thinkbiganalytics.com
> +1-312-339-1330
>
>
>


-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330

Re: Creating Indexes

Reply via email to