Re: Why broadcast Edge property?

Jianfeng (Jeff) Zhang Thu, 26 Mar 2015 21:48:00 -0700

Hi Azuryy,

I check the source code, the last vertex Writer do an in-memory sort in its 
processor. So in this case broadcast is possible (also require parallelism to 
be 1)
If the edge is scatter-gather and use OrderedPartitionedKVEdgeConfig, then the 
in-memory sort is not necessary because the input to processor has already been 
sorted in shuffle stage.


Looks like there's one another version of TopK using scatter-gather 
https://github.com/sequenceiq/sequenceiq-samples/tree/master/tez-topk
The source code and README is not consistent.



        @Override

        public void run() throws Exception {

            Preconditions.checkArgument(getInputs().size() == 1);

            Preconditions.checkArgument(getOutputs().size() == 1);

            KeyValueWriter kvWriter = (KeyValueWriter) 
getOutputs().get(OUTPUT).getWriter();

            UnorderedKVReader kvReader = (UnorderedKVReader) 
getInputs().get(SUM).getReader();

            while (kvReader.next()) {

                localTop.store(

                        Integer.valueOf(kvReader.getCurrentKey().toString()),

                        kvReader.getCurrentValue().toString()

                );

            }

            Map<Integer, List<String>> result = localTop.getTopKSorted();

            for (int top : result.keySet()) {

                kvWriter.write(new Text(join(result.get(top), ",")), new 
IntWritable(top));

            }

        }


Best Regard,
Jeff Zhang


From: Azuryy Yu <azury...@gmail.com<mailto:azury...@gmail.com>>
Reply-To: "user@tez.apache.org<mailto:user@tez.apache.org>" 
<user@tez.apache.org<mailto:user@tez.apache.org>>
Date: Friday, March 27, 2015 at 9:21 AM
To: "user@tez.apache.org<mailto:user@tez.apache.org>" 
<user@tez.apache.org<mailto:user@tez.apache.org>>
Subject: Why broadcast Edge property?

Hi,

please look through this simple code:
https://github.com/sequenceiq/sequenceiq-samples/blob/master/tez-topk/src/main/java/com/sequenceiq/tez/topk/TopK.java

why they create a broadcast edge property from SUM to WRITER? what about 
default edge property? (scatter-gather)

Re: Why broadcast Edge property?

Reply via email to