Yup, exactly. A MapReduce is equivalent to a ParDo-GroupByKey-ParDo. On Tue, Jun 7, 2016 at 9:55 AM, Jesse Anderson <[email protected]> wrote:
> Pawel, > > I think I understand it better. By running a ParDo after a GroupByKey, the > DoFn runs as what a MapReduce-style would do in the reducer. > > Thanks, > > Jesse > > On Tue, Jun 7, 2016 at 12:17 PM Pawel Szczur <[email protected]> > wrote: > >> There's no shuffle sort (at least no documented), but they may be shuffle >> e.g. when one use GroupByKey. In such case, the result is: >> PCollection<KV<?,Iterable<?>> >> if you apply DoFn, it's similar to running reducer. >> >> 2016-06-07 18:10 GMT+02:00 Jesse Anderson <[email protected]>: >> >>> I have a question about the ParDo JavaDocs. The JavaDocs say: >>> The ParDo processing style is similar to what happens inside the >>> "Mapper" or "Reducer" class of a MapReduce-style algorithm. >>> >>> I think a ParDo's DoFn is only what a Mapper class would do. A DoFn >>> doesn't seem to run after a shuffle sort like a reducer does. Is my >>> understanding correct? >>> >>> Thanks, >>> >>> Jesse >>> >> >>
