Thanks Jianfeng. if I use OrderedPartitionedKVEdgeConfig , and use broadcast edge property with parallism 1, does that Tez also do shuffle and group before broadcast?
On Fri, Mar 27, 2015 at 12:47 PM, Jianfeng (Jeff) Zhang < [email protected]> wrote: > > Hi Azuryy, > > I check the source code, the last vertex Writer do an in-memory sort in > its processor. So in this case broadcast is possible (also require > parallelism to be 1) > If the edge is scatter-gather and use OrderedPartitionedKVEdgeConfig, then > the in-memory sort is not necessary because the input to processor has > already been sorted in shuffle stage. > > Looks like there’s one another version of TopK using scatter-gather > https://github.com/sequenceiq/sequenceiq-samples/tree/master/tez-topk > The source code and README is not consistent. > > > @Override > > public void run() throws Exception { > > Preconditions.checkArgument(getInputs().size() == 1); > > Preconditions.checkArgument(getOutputs().size() == 1); > > KeyValueWriter kvWriter = (KeyValueWriter) getOutputs().get( > OUTPUT).getWriter(); > > UnorderedKVReader kvReader = (UnorderedKVReader) > getInputs().get(SUM).getReader(); > > while (kvReader.next()) { > > localTop.store( > > > Integer.valueOf(kvReader.getCurrentKey().toString()), > > kvReader.getCurrentValue().toString() > > ); > > } > > Map<Integer, List<String>> result = localTop.getTopKSorted(); > > for (int top : result.keySet()) { > > kvWriter.write(new Text(join(result.get(top), ",")), new > IntWritable(top)); > > } > > } > > > Best Regard, > Jeff Zhang > > > From: Azuryy Yu <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Friday, March 27, 2015 at 9:21 AM > To: "[email protected]" <[email protected]> > Subject: Why broadcast Edge property? > > Hi, > > please look through this simple code: > > https://github.com/sequenceiq/sequenceiq-samples/blob/master/tez-topk/src/main/java/com/sequenceiq/tez/topk/TopK.java > > why they create a broadcast edge property from SUM to WRITER? what about > default edge property? (scatter-gather) > > > >
