I'd skip the second step. "For DAG, VERTEX i use the #setConf() method to forward *all properties with the* *corresponding scope* from my main conf object". This won't help anything at the moment. Other than that, this should work.
InputInitializers and OutputCommitters (as well as Processors, Inputs, Outputs) have a user payload field. If using FileInputFrmat / FileOutputFormat based Inputs and Outputs - a payload is setup for the initializer / committer. That will contain a Configuration instances (and some more information) serialized to bytes. This Configuration instance would require some of the properties as well. Regarding the TezRuntimeConfiguration values - these are used when configuring the standard Edges, and setAdditionalConfiguration will take care of propagating the appropriate config parameters for a specific edge. On Tue, Sep 15, 2015 at 3:52 AM, Johannes Zillmann <[email protected] > wrote: > Alright… once again… > > So i saw that all the TezConfiguration fields are annotated with a Scope > like AM, DAG, VERTEX, etc… > So here is what i intend to do: > - The TezConfiguration for TezClient.create() will simply contain *all > properties *from my main conf object > - For DAG, VERTEX i use the #setConf() method to forward *all properties > with the* *corresponding scope* from my main conf object > - For the edgeBuilder i use the #setAdditionalConfiguration() method to > forward *all properties *from my main conf object > > So does this strategy make sense to you or am i missing something or > getting it wrong ? > > Couple of more questions: > - Regarding your comment on InputInitializers and OutputCommitters… I > don’t see any possibility to set properties on that. I’m using the user > payload to transfer conf values which are needed. Do i miss something here ? > - What about the TezRuntimeConfiguration values, do i need to do anything > special with that ? > > > best > Johannes > > > > On 14 Sep 2015, at 20:42, Siddharth Seth <[email protected]> wrote: > > For Edges, the approach that you took with > edgeBuilder.setAdditionalConfiguration will work to set relevant Tez > properties for an edge. You should be able to iterate through properties > and set the config on the edge - and the relevant ones will be set. > (Compression has a specific API which you could use, but using > setAdditionalConfiguration will also work). > Typically, additional Hadoop properties are also required for Edges - > things like the list of compression codecs. > edgeConfigs.setAdditionalConfiguration does take care of allowing these > properties through. > > The TezClient needs to be provided a config - which is then made available > to the AM. There's not much filtering involved here, and you could set > tez.* for this configuration instance. An attempt will be made to pick up > YarnConfiguration to connect to the cluster. > > The same applies for InputInitializers and OutputCommitters. Typically > (and unfortunately), you'll end up setting all configs. > > dag.setConf, and vertex.setConf should not be used - I've opened a jira to > add docs for these. > > How do you get the Hadoop configs in this case ? Is that part of the > Configuration like object ? > > > > On Mon, Sep 14, 2015 at 9:47 AM, Johannes Zillmann < > [email protected]> wrote: > >> Ok, >> >> found it. The >> >> edgeBuilder.setAdditionalConfiguration(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, >> "true”); >> does work for me! >> >> So let me describe my use case a little bit... >> Basically i have one Configuration like object on the client side. This >> is assembled by multiple sources and the only way a user can set custom Tez >> properties (do not use tez-site.xml in any perspective). >> Then i’m building my DAG with its vertices and edges programatically. >> Now, do you have any recommendation for me how to route the right Tez >> properties effectively to the corresponding Tez components ? (with tez >> components i mean like vertex properties, dag properties, AM properties, >> edge properties, etc..) >> >> Should i simply set all tez.* properties to any component or is there a >> smarter way ? >> And what components/properties might i’m missing ? >> >> Any help appreciated! >> Johannes >> >> >> On 14 Sep 2015, at 16:57, Johannes Zillmann <[email protected]> >> wrote: >> >> Hey guys, >> >> question. How do i enabled tez.runtime.compress programatically ? >> When i set this property in the tez-site.xml it is picket up correctly. >> But all other options i tried: >> - dag.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true"); >> - mapVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true" >> ); >> - reduceVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, >> "true”); >> >> do not have any effect! (Checking the log output of the Shuffle class) >> >> Johannes >> >> >> > >
