Hi Azuryy, I check the source code, the last vertex Writer do an in-memory sort in its processor. So in this case broadcast is possible (also require parallelism to be 1) If the edge is scatter-gather and use OrderedPartitionedKVEdgeConfig, then the in-memory sort is not necessary because the input to processor has already been sorted in shuffle stage.
Looks like there's one another version of TopK using scatter-gather https://github.com/sequenceiq/sequenceiq-samples/tree/master/tez-topk The source code and README is not consistent. @Override public void run() throws Exception { Preconditions.checkArgument(getInputs().size() == 1); Preconditions.checkArgument(getOutputs().size() == 1); KeyValueWriter kvWriter = (KeyValueWriter) getOutputs().get(OUTPUT).getWriter(); UnorderedKVReader kvReader = (UnorderedKVReader) getInputs().get(SUM).getReader(); while (kvReader.next()) { localTop.store( Integer.valueOf(kvReader.getCurrentKey().toString()), kvReader.getCurrentValue().toString() ); } Map<Integer, List<String>> result = localTop.getTopKSorted(); for (int top : result.keySet()) { kvWriter.write(new Text(join(result.get(top), ",")), new IntWritable(top)); } } Best Regard, Jeff Zhang From: Azuryy Yu <azury...@gmail.com<mailto:azury...@gmail.com>> Reply-To: "user@tez.apache.org<mailto:user@tez.apache.org>" <user@tez.apache.org<mailto:user@tez.apache.org>> Date: Friday, March 27, 2015 at 9:21 AM To: "user@tez.apache.org<mailto:user@tez.apache.org>" <user@tez.apache.org<mailto:user@tez.apache.org>> Subject: Why broadcast Edge property? Hi, please look through this simple code: https://github.com/sequenceiq/sequenceiq-samples/blob/master/tez-topk/src/main/java/com/sequenceiq/tez/topk/TopK.java why they create a broadcast edge property from SUM to WRITER? what about default edge property? (scatter-gather)