Somewhat related in case anyone hits this during Googling. If you want the
results of your GroupByKey to be grouped while writing out like:
54.148.33.jdj Hits:44 At:2015-03-31T04:00:29.999Z
54.148.33.jdj Hits:44 At:2015-03-31T04:00:59.999Z
54.148.33.jdj Hits:2 At:2015-03-31T04:01:29.999Z
107.22.225.dea Hits:18 At:2015-03-31T04:00:29.999Z
107.22.225.dea Hits:18 At:2015-03-31T04:00:59.999Z
107.22.225.dea Hits:1 At:2015-03-31T04:01:29.999Z
190.29.67.djc Hits:1 At:2015-03-31T04:00:29.999Z
190.29.67.djc Hits:1 At:2015-03-31T04:00:59.999Z

Simply doing a "context.output" for each line will not write them out one
after another. You'll to do a single "context.output" call with multiple
"\n" in the string.

On Tue, May 17, 2016 at 11:57 AM Jesse Anderson <[email protected]>
wrote:

> That's what I needed. Thanks Frances.
>
> On Tue, May 17, 2016 at 11:54 AM Frances Perry <[email protected]> wrote:
>
>> Re-window into the GlobalWindow (which covers all time).
>>
>> pc.apply(Window.<T>into(new GlobalWindows()))
>>
>> The GlobalWindow is the default. It works for bounded collections (aka.
>> batch mode) because once all the data has been processed, the system fast
>> forwards to the end of time. And if you are processing an unbounded
>> PCollection with aggregations (aka. streaming mode), you have to either set
>> a different windowing scheme or using triggers so that we aren't waiting
>> til the end of time for results.
>>
>> Hope that helps!
>>
>> On Tue, May 17, 2016 at 11:44 AM, Jesse Anderson <[email protected]>
>> wrote:
>>
>>> Is there a way to remove windowing from a PCollection?
>>>
>>> Let's say I had the following code:
>>>       PCollection<String> windowedWords = parsed
>>>           .apply(Window.<String>into(
>>>             FixedWindows.of(Duration.standardSeconds(30))));
>>>
>>>       PCollection<KV<String, Long>> eventCounts =
>>> windowedWords.apply(Count.perElement());
>>>
>>>       // Don't window anymore for the GroupByKey
>>>       PCollection<KV<String, Iterable<Long>>> grouped =
>>> eventCounts.apply(GroupByKey.<String, Long>create());
>>>
>>>       PCollection<String> formattedCounts = grouped.apply(ParDo.of(new
>>> TimedFN()));
>>>
>>> I've added the window and performed the counts. In the GroupByKey, I no
>>> longer want windowing to apply. I want to group by the key across windows
>>> now. The Iterable that comes back from the GroupByKey only has the one Long
>>> from its Window instead of all N Longs from all Windows. How do you do
>>> remove the Window to GroupByKey for all times?
>>>
>>> Thanks,
>>>
>>> Jesse
>>>
>>
>>

Reply via email to