Alright… once again… So i saw that all the TezConfiguration fields are annotated with a Scope like AM, DAG, VERTEX, etc… So here is what i intend to do: - The TezConfiguration for TezClient.create() will simply contain all properties from my main conf object - For DAG, VERTEX i use the #setConf() method to forward all properties with the corresponding scope from my main conf object - For the edgeBuilder i use the #setAdditionalConfiguration() method to forward all properties from my main conf object
So does this strategy make sense to you or am i missing something or getting it wrong ? Couple of more questions: - Regarding your comment on InputInitializers and OutputCommitters… I don’t see any possibility to set properties on that. I’m using the user payload to transfer conf values which are needed. Do i miss something here ? - What about the TezRuntimeConfiguration values, do i need to do anything special with that ? best Johannes > On 14 Sep 2015, at 20:42, Siddharth Seth <[email protected]> wrote: > > For Edges, the approach that you took with > edgeBuilder.setAdditionalConfiguration will work to set relevant Tez > properties for an edge. You should be able to iterate through properties and > set the config on the edge - and the relevant ones will be set. (Compression > has a specific API which you could use, but using setAdditionalConfiguration > will also work). > Typically, additional Hadoop properties are also required for Edges - things > like the list of compression codecs. edgeConfigs.setAdditionalConfiguration > does take care of allowing these properties through. > > The TezClient needs to be provided a config - which is then made available to > the AM. There's not much filtering involved here, and you could set tez.* for > this configuration instance. An attempt will be made to pick up > YarnConfiguration to connect to the cluster. > > The same applies for InputInitializers and OutputCommitters. Typically (and > unfortunately), you'll end up setting all configs. > > dag.setConf, and vertex.setConf should not be used - I've opened a jira to > add docs for these. > > How do you get the Hadoop configs in this case ? Is that part of the > Configuration like object ? > > > > On Mon, Sep 14, 2015 at 9:47 AM, Johannes Zillmann <[email protected] > <mailto:[email protected]>> wrote: > Ok, > > found it. The > > edgeBuilder.setAdditionalConfiguration(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, > "true”); > does work for me! > > So let me describe my use case a little bit... > Basically i have one Configuration like object on the client side. This is > assembled by multiple sources and the only way a user can set custom Tez > properties (do not use tez-site.xml in any perspective). > Then i’m building my DAG with its vertices and edges programatically. > Now, do you have any recommendation for me how to route the right Tez > properties effectively to the corresponding Tez components ? (with tez > components i mean like vertex properties, dag properties, AM properties, edge > properties, etc..) > > Should i simply set all tez.* properties to any component or is there a > smarter way ? > And what components/properties might i’m missing ? > > Any help appreciated! > Johannes > > >> On 14 Sep 2015, at 16:57, Johannes Zillmann <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hey guys, >> >> question. How do i enabled tez.runtime.compress programatically ? >> When i set this property in the tez-site.xml it is picket up correctly. >> But all other options i tried: >> - dag.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true"); >> - mapVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true"); >> - reduceVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”); >> >> do not have any effect! (Checking the log output of the Shuffle class) >> >> Johannes > >
