Somewhat related in case anyone hits this during Googling. If you want the results of your GroupByKey to be grouped while writing out like: 54.148.33.jdj Hits:44 At:2015-03-31T04:00:29.999Z 54.148.33.jdj Hits:44 At:2015-03-31T04:00:59.999Z 54.148.33.jdj Hits:2 At:2015-03-31T04:01:29.999Z 107.22.225.dea Hits:18 At:2015-03-31T04:00:29.999Z 107.22.225.dea Hits:18 At:2015-03-31T04:00:59.999Z 107.22.225.dea Hits:1 At:2015-03-31T04:01:29.999Z 190.29.67.djc Hits:1 At:2015-03-31T04:00:29.999Z 190.29.67.djc Hits:1 At:2015-03-31T04:00:59.999Z
Simply doing a "context.output" for each line will not write them out one after another. You'll to do a single "context.output" call with multiple "\n" in the string. On Tue, May 17, 2016 at 11:57 AM Jesse Anderson <[email protected]> wrote: > That's what I needed. Thanks Frances. > > On Tue, May 17, 2016 at 11:54 AM Frances Perry <[email protected]> wrote: > >> Re-window into the GlobalWindow (which covers all time). >> >> pc.apply(Window.<T>into(new GlobalWindows())) >> >> The GlobalWindow is the default. It works for bounded collections (aka. >> batch mode) because once all the data has been processed, the system fast >> forwards to the end of time. And if you are processing an unbounded >> PCollection with aggregations (aka. streaming mode), you have to either set >> a different windowing scheme or using triggers so that we aren't waiting >> til the end of time for results. >> >> Hope that helps! >> >> On Tue, May 17, 2016 at 11:44 AM, Jesse Anderson <[email protected]> >> wrote: >> >>> Is there a way to remove windowing from a PCollection? >>> >>> Let's say I had the following code: >>> PCollection<String> windowedWords = parsed >>> .apply(Window.<String>into( >>> FixedWindows.of(Duration.standardSeconds(30)))); >>> >>> PCollection<KV<String, Long>> eventCounts = >>> windowedWords.apply(Count.perElement()); >>> >>> // Don't window anymore for the GroupByKey >>> PCollection<KV<String, Iterable<Long>>> grouped = >>> eventCounts.apply(GroupByKey.<String, Long>create()); >>> >>> PCollection<String> formattedCounts = grouped.apply(ParDo.of(new >>> TimedFN())); >>> >>> I've added the window and performed the counts. In the GroupByKey, I no >>> longer want windowing to apply. I want to group by the key across windows >>> now. The Iterable that comes back from the GroupByKey only has the one Long >>> from its Window instead of all N Longs from all Windows. How do you do >>> remove the Window to GroupByKey for all times? >>> >>> Thanks, >>> >>> Jesse >>> >> >>
