Thanks Jianfeng.

if I use OrderedPartitionedKVEdgeConfig , and use broadcast edge property
with parallism 1, does that Tez also do shuffle and group before broadcast?



On Fri, Mar 27, 2015 at 12:47 PM, Jianfeng (Jeff) Zhang <
[email protected]> wrote:

>
>  Hi Azuryy,
>
>  I check the source code, the last vertex Writer do an in-memory sort in
> its processor. So in this case broadcast is possible (also require
> parallelism to be 1)
> If the edge is scatter-gather and use OrderedPartitionedKVEdgeConfig, then
> the in-memory sort is not necessary because the input to processor has
> already been sorted in shuffle stage.
>
>  Looks like there’s one another version of TopK using scatter-gather
> https://github.com/sequenceiq/sequenceiq-samples/tree/master/tez-topk
>  The source code and README is not consistent.
>
>
>           @Override
>
>         public void run() throws Exception {
>
>             Preconditions.checkArgument(getInputs().size() == 1);
>
>             Preconditions.checkArgument(getOutputs().size() == 1);
>
>             KeyValueWriter kvWriter = (KeyValueWriter) getOutputs().get(
> OUTPUT).getWriter();
>
>             UnorderedKVReader kvReader = (UnorderedKVReader)
> getInputs().get(SUM).getReader();
>
>             while (kvReader.next()) {
>
>                 localTop.store(
>
>
> Integer.valueOf(kvReader.getCurrentKey().toString()),
>
>                         kvReader.getCurrentValue().toString()
>
>                 );
>
>             }
>
>             Map<Integer, List<String>> result = localTop.getTopKSorted();
>
>             for (int top : result.keySet()) {
>
>                 kvWriter.write(new Text(join(result.get(top), ",")), new
> IntWritable(top));
>
>             }
>
>         }
>
>
>  Best Regard,
> Jeff Zhang
>
>
>   From: Azuryy Yu <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Friday, March 27, 2015 at 9:21 AM
> To: "[email protected]" <[email protected]>
> Subject: Why broadcast Edge property?
>
>    Hi,
>
>  please look through this simple code:
>
> https://github.com/sequenceiq/sequenceiq-samples/blob/master/tez-topk/src/main/java/com/sequenceiq/tez/topk/TopK.java
>
>  why they create a broadcast edge property from SUM to WRITER? what about
> default edge property? (scatter-gather)
>
>
>
>

Reply via email to