Hey Sid, On 05 Aug 2014, at 21:05, Siddharth Seth <[email protected]> wrote:
> The last configuration parameter to " > OrderedPartitionedKVEdgeConfigurer.newBuilder(keyClassName, valueClassName, > myPartitionerClassName, jobConfForShuffleSort);" is the configuration for the > partitioner itself. That's only used in the Output - and hence is not > available in the consuming Input. > > It looks like we're missing the option to set a Configuration for the > comparator. There's a couple of other changes required in the EdgeConfigurers > - I'll create a jira and post a patch later today. Cool, thanks! > > One of the big reasons to separate out the Configurations is to limit the > size of the payload generated. Using a generic conf (which usually ends up > inheriting from JobConf etc) ends up setting a large number of keys (1000+ in > cases), off which very few are actually used. setFromConfiguration(...) > actually strips out unused keys. The partitionerConf parameter is meant to be > a very specific Configuration only for the Partitioner (should only contain > the limited set of keys required to run the partitioner). Similarly for the > Comparator conf - once it is added. Tez has no way of knowing what a valid > set of keys for the partitioner, comparator and combiner are - since these > are all user specified classes. ++++1 yeah, basically i like moving away from configuration! Just this time it hit me a bit ;) > > Till I can get a patch going for this, your usage model to get this working > is likely the only one which will work. Ok will do! Johannes > > > On Tue, Aug 5, 2014 at 8:23 AM, Johannes Zillmann <[email protected]> > wrote: > Hey guys, > > i just upgraded my application to the most current master code of Tez. > Run into a problem with setting up my custom key comparator. > It implements org.apache.hadoop.conf.Configurable and expects a custom > property in the passed in configuration. > > So initially i tried: > JobConf jobConfForShuffleSort = new JobConf(); > jobConfForShuffleSort.set(“myCustomProperty”,”value”) > Builder edgeConfBuilder = > OrderedPartitionedKVEdgeConfigurer.newBuilder(keyClassName, valueClassName, > myPartitionerClassName, jobConfForShuffleSort); > > But the property does not come through to the instance of > ‘myPartitionerClassName’. > Basically i see the comparator instantiated 2 times: > > (1) Here the custom property is available: > java.lang.Exception > at myPartitionerClassName.setConf(TezRecordComparator.java:42) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateOutputKeyComparator(ConfigUtils.java:125) > at > org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.<init>(ExternalSorter.java:158) > at > org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.<init>(DefaultSorter.java:116) > at > org.apache.tez.runtime.library.output.OnFileSortedOutput.start(OnFileSortedOutput.java:109) > at > SimpleVertexProcessor.initializeInputOutputs(SimpleVertexProcessor.java:190) > > (2) Here it is not: > java.lang.Exception > at myPartitionerClassName.setConf(TezRecordComparator.java:42) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at > org.apache.tez.runtime.library.common.ConfigUtils.getIntermediateInputKeyComparator(ConfigUtils.java:135) > at > org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.finalMerge(MergeManager.java:808) > at > org.apache.tez.runtime.library.common.shuffle.impl.MergeManager.close(MergeManager.java:465) > at > org.apache.tez.runtime.library.common.shuffle.impl.Shuffle$RunShuffleCallable.call(Shuffle.java:344) > > > Found following workaround: > Configuration payloadConf = > TezUtils.createConfFromUserPayload(edgeProperty.getEdgeDestination().getUserPayload()); > payloadConf(“myCustomProperty”,”value”) > > edgeProperty.getEdgeDestination().setUserPayload(TezUtils.createUserPayloadFromConf(payloadConf)); > > I think it boils down to that the property is passed to the edge input but > not to its destination !? > However, is there some smarter way making that property available to all > instantiations of the comparator ? > I tried using > edgeConfBuilder.setAdditionalConfiguration(...) > edgeConfBuilder.configureOutput().setAdditionalConfiguration(…) > but that seems to filter out custom properties. > > Also do you plan to use a non-configuration based payload mechanism for the > edge stuff like you did for the input, output, processor ? > > Any enlightenment appreciated! > Johannes > > >
