Alright… once again…

So i saw that all the TezConfiguration fields are annotated with a Scope like 
AM, DAG, VERTEX, etc…
So here is what i intend to do:
- The TezConfiguration for TezClient.create() will simply contain all 
properties from my main conf object
- For DAG, VERTEX i use the #setConf() method to forward all properties with 
the corresponding scope from my main conf object
- For the edgeBuilder i use the #setAdditionalConfiguration() method to forward 
all properties from my main conf object

So does this strategy make sense to you or am i missing something or getting it 
wrong ?

Couple of more questions:
- Regarding your comment on InputInitializers and OutputCommitters… I don’t see 
any possibility to set properties on that. I’m using the user payload to 
transfer conf values which are needed. Do i miss something here ?
- What about the TezRuntimeConfiguration values, do i need to do anything 
special with that ?


best
Johannes
 


> On 14 Sep 2015, at 20:42, Siddharth Seth <[email protected]> wrote:
> 
> For Edges, the approach that you took with 
> edgeBuilder.setAdditionalConfiguration will work to set relevant Tez 
> properties for an edge. You should be able to iterate through properties and 
> set the config on the edge - and the relevant ones will be set. (Compression 
> has a specific API which you could use, but using setAdditionalConfiguration 
> will also work).
> Typically, additional Hadoop properties are also required for Edges - things 
> like the list of compression codecs. edgeConfigs.setAdditionalConfiguration 
> does take care of allowing these properties through.
> 
> The TezClient needs to be provided a config - which is then made available to 
> the AM. There's not much filtering involved here, and you could set tez.* for 
> this configuration instance. An attempt will be made to pick up 
> YarnConfiguration to connect to the cluster.
> 
> The same applies for InputInitializers and OutputCommitters. Typically (and 
> unfortunately), you'll end up setting all configs.
> 
> dag.setConf, and vertex.setConf should not be used - I've opened a jira to 
> add docs for these.
> 
> How do you get the Hadoop configs in this case ? Is that part of the 
> Configuration like object ?
> 
> 
> 
> On Mon, Sep 14, 2015 at 9:47 AM, Johannes Zillmann <[email protected] 
> <mailto:[email protected]>> wrote:
> Ok, 
> 
> found it. The 
>       
> edgeBuilder.setAdditionalConfiguration(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS,
>  "true”); 
> does work for me!
> 
> So let me describe my use case a little bit...
> Basically i have one Configuration like object on the client side. This is 
> assembled by multiple sources and the only way a user can set custom Tez 
> properties (do not use tez-site.xml in any perspective). 
> Then i’m building my DAG with its vertices and edges programatically. 
> Now, do you have any recommendation for me how to route the right Tez 
> properties effectively to the corresponding Tez components ? (with tez 
> components i mean like vertex properties, dag properties, AM properties, edge 
> properties, etc..)
> 
> Should i simply set all tez.* properties to any component or is there a 
> smarter way ?
> And what components/properties might i’m missing ?
> 
> Any help appreciated!
> Johannes
> 
> 
>> On 14 Sep 2015, at 16:57, Johannes Zillmann <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hey guys,
>> 
>> question. How do i enabled tez.runtime.compress programatically ?
>> When i set this property in the tez-site.xml it is picket up correctly.
>> But all other options i tried:
>> - dag.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
>> - mapVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true");
>> - reduceVertex.setConf(TezRuntimeConfiguration.TEZ_RUNTIME_COMPRESS, "true”);
>> 
>> do not have any effect! (Checking the log output of the Shuffle class)
>> 
>> Johannes
> 
> 

Reply via email to