I am just getting started with understanding tez code, so bear with me; I might be wrong here.
In the WordCount example, while creating the Tokenizer Vertex, neither the parallelism or VertexLocation hints is specified. My guess is that at runtime, based on InputInitializer, these values are populated. However, I do not want them to be populated at runtime, but rather want them specified while creating the DAG itself. When I do that, I get the exception mentioned in the previous mail. What should I do such that location of the tasks for the Tokenizer vertex are not based on HDFS splits but can be arbitrarily configured while creation ? Raajay On Thu, Sep 10, 2015 at 12:01 AM, Jianfeng (Jeff) Zhang < [email protected]> wrote: > > Actually Tokenizer vertex should already have the VertexLocationHints from > the hdfs file split info at runtime. Did you see any unexpected behavior ? > > > > Best Regard, > Jeff Zhang > > > From: Raajay <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Thursday, September 10, 2015 at 12:35 PM > To: "[email protected]" <[email protected]> > Subject: Error of setting vertex location hints > > In the WordCount example, I am trying to fix the location of map tasks by > providing "VertexLocationHints" to the "tokenizer" vertex. > > However, the application fails with an exception (stacktrace below). I > guess it is because, the vertex manager expects the parallelism to be -1, > so that it can compute it. > > > What minimal modification to the example would avoid invoking the > VertexManager and allow me use my own customized VertexLocationHint ? > > > Thanks > Raajay > > > > DAG diagnostics: [Vertex failed, vertexName=Tokenizer, > vertexId=vertex_1441839249749_0017_1_00, diagnostics=[Vertex > vertex_1441839249749_0017_1_00 [Tokenizer] killed/failed due > to:AM_USERCODE_FAILURE, Exception in VertexManager, > vertex:vertex_1441839249749_0017_1_00 [Tokenizer], > java.lang.IllegalStateException: Parallelism for the vertex should be set > to -1 if the InputInitializer is setting parallelism, VertexName: Tokenizer > at > com.google.common.base.Preconditions.checkState(Preconditions.java:145) > at > org.apache.tez.dag.app.dag.impl.RootInputVertexManager.onRootVertexInitialized(RootInputVertexManager.java:60) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventRootInputInitialized.invoke(VertexManager.java:610) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:631) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:626) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:626) > at > org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:615) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > ], Vertex killed, vertexName=Summation, > vertexId=vertex_1441839249749_0017_1_01, diagnostics=[Vertex received Kill > in INITED state., Vertex vertex_1441839249749_0017_1_01 [Summation] > killed/failed due to:null], DAG did not succeed due to VERTEX_FAILURE. > failedVertices:1 killedVertices:1] > DAG did not succeed > >
