yes. we tried mr and it works fine. so it's more likely a tez issue. Thanks for your comments.
On Tue, Jan 5, 2016 at 11:58 AM Jörn Franke <jornfra...@gmail.com> wrote: > You can still use execution Engine mr for maintaining the index. Indeed > with the ORC or parquet format there are min/max indexes and bloom filters, > but you need to sort your data appropriately to benefit from performance. > Alternatively you can create redundant tables sorted in different order. > The "traditional" indexes can still make sense for data not in Orc or > parquet format. > Keep in mind that for warehouse scenarios there are many other > optimization methods in Hive. > > On 05 Jan 2016, at 19:17, Ting(Goden) Yao <t...@pivotal.io> wrote: > > Hi, > > We hit an issue when doing Hive testing to rebuild index on Tez. > We were told by our Hadoop distro vendor that it's not recommended (or > should avoid) using index with Hive. > > But I don't see an official message on Hive wiki > <https://cwiki.apache.org/confluence/display/Hive/IndexDev> or > documentation. > Can someone confirm that so we'll ask our users to avoid indexing. > > Thanks. > -Goden > > ==Exceptions (if you're interested in details) == > > Exception: > > 2015-12-08 22:55:30,263 FATAL [AsyncDispatcher event handler] > event.AsyncDispatcher: Error in dispatcher thread > org.apache.tez.dag.api.TezUncheckedException: Unable to instantiate class > with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator > at > org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:80) > at > org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114) > at > org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3943) > at > org.apache.tez.dag.app.dag.impl.VertexImpl.access$3900(VertexImpl.java:180) > at > org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2956) > at > org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2906) > at > org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2887) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179) > at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764) > at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:69) > ... 20 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.initialize(DynamicPartitionPruner.java:154) > at > org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.<init>(DynamicPartitionPruner.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:95) > ... 25 more > 2015-12-08 22:55:30,266 ERROR [AsyncDispatcher event handler] > impl.VertexImpl: Can't handle Invalid event V_START on vertex Map 1 with > vertexId vertex_1449613300943_0002_1_00 at current state NEW > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > V_START at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556) > at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179) > at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764) > at > org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > 2015-12-08 22:55:30,267 ERROR [AsyncDispatcher event handler] > impl.VertexImpl: Invalid event V_INTERNAL_ERROR on Vert > >