You can still use execution Engine mr for maintaining the index. Indeed with the ORC or parquet format there are min/max indexes and bloom filters, but you need to sort your data appropriately to benefit from performance. Alternatively you can create redundant tables sorted in different order. The "traditional" indexes can still make sense for data not in Orc or parquet format. Keep in mind that for warehouse scenarios there are many other optimization methods in Hive.
> On 05 Jan 2016, at 19:17, Ting(Goden) Yao <t...@pivotal.io> wrote: > > Hi, > > We hit an issue when doing Hive testing to rebuild index on Tez. > We were told by our Hadoop distro vendor that it's not recommended (or should > avoid) using index with Hive. > > But I don't see an official message on Hive wiki or documentation. > Can someone confirm that so we'll ask our users to avoid indexing. > > Thanks. > -Goden > > ==Exceptions (if you're interested in details) == > Exception: > > 2015-12-08 22:55:30,263 FATAL [AsyncDispatcher event handler] > event.AsyncDispatcher: Error in dispatcher thread > org.apache.tez.dag.api.TezUncheckedException: Unable to instantiate class > with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator > at > org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:80) > at > org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114) > at > org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3943) > at > org.apache.tez.dag.app.dag.impl.VertexImpl.access$3900(VertexImpl.java:180) > at > org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2956) > at > org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2906) > at > org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2887) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179) > at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764) > at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:69) > ... 20 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.initialize(DynamicPartitionPruner.java:154) > at > org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.<init>(DynamicPartitionPruner.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:95) > ... 25 more > 2015-12-08 22:55:30,266 ERROR [AsyncDispatcher event handler] > impl.VertexImpl: Can't handle Invalid event V_START on vertex Map 1 with > vertexId vertex_1449613300943_0002_1_00 at current state NEW > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > V_START at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179) > at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764) > at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > 2015-12-08 22:55:30,267 ERROR [AsyncDispatcher event handler] > impl.VertexImpl: Invalid event V_INTERNAL_ERROR on Vert