On Wed, Jul 26, 2017 at 9:43 PM, Kenneth Knowles
wrote:
> This is a bit of an improvised change to the Beam model, if these are
> really treated *that* specially. (notably, they are a subset of the
> WindowFns that we ship with our SDKs, so it really is a careful selection)
>
> It does make sense
This is a bit of an improvised change to the Beam model, if these are
really treated *that* specially. (notably, they are a subset of the
WindowFns that we ship with our SDKs, so it really is a careful selection)
It does make sense to have some special WindowFns with distinguished
semantics, since
I think there may be a distinction between hard-coding support for the
"standard" WindowFns (e.g.
https://github.com/apache/beam/blob/master/sdks/common/runner-api/src/main/proto/standard_window_fns.proto)
and accepting WindowFns as a UDF. Different runners have offered different
levels of support
Hi Etienne,
Every WindowFn is a UDF, so there is really no such thing as "custom"
window merging. Is this the same as saying that a runner supports only
merging for Sessions? Or just supports WindowFn that merges based on
overlap?
Kenn
On Mon, Jul 24, 2017 at 10:15 AM, Etienne Chauchot
wrote:
On Wed, Jul 26, 2017 at 8:58 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:
> Hmm, yes, I just noticed that PCollection has a setTypeDescriptor() method,
> and I wonder how much will break if all call sites of setCoder() will call
> setTypeDescriptor() instead - i.e. how far are we fr
Hmm, yes, I just noticed that PCollection has a setTypeDescriptor() method,
and I wonder how much will break if all call sites of setCoder() will call
setTypeDescriptor() instead - i.e. how far are we from the ideal state of
having a coder inferrable for every sufficiently concrete type descriptor.
+1 but maybe go ever further
On Tue, Jul 25, 2017 at 8:25 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:
> Hello,
>
> I've worked on a few different things recently and ran repeatedly into the
> same issue: that we do not have clear guidance on who should set the Coder
> on a PCollec
Okay, first PR is in review https://github.com/apache/beam/pull/3649
On Wed, Jul 26, 2017 at 11:58 AM Robert Bradshaw
wrote:
> +1, I'm a huge fan of moving this direction. Right now there's also
> the ugliness that setCoder() may be called any number of times before
> a PCollection is used (the
On Wed, Jul 26, 2017 at 7:45 AM, Lukasz Cwik wrote:
> Robert, in your case where output is being produced based upon a heartbeat,
> either the watermark on the output went to infinity and all that data being
> produced is droppable at which point the timer becomes droppable
But why are these time
+1, I'm a huge fan of moving this direction. Right now there's also
the ugliness that setCoder() may be called any number of times before
a PCollection is used (the last setter winning) but is an error to
call it once it has been used (and here "used" is not clear--if a
PCollection is returned from
Second that 'it's responsibility of the transform'. For the case when a
PTransform doesn't have enough information(PTransform developer should have
the knowledge), I would prefer a strict way so users won't forget to call
withSomethingCoder(), like
- a Coder is required to new the PTransform;
- or
Hm, can you elaborate? I'm not sure how this relates to my suggestion, the
gist of which is "PTransform's should set the coder on all of their
outputs, and the user should never have to .setCoder() on a PCollection
obtained from a PTransform"
On Wed, Jul 26, 2017 at 7:38 AM Lukasz Cwik
wrote:
>
Yes, there was! TextIO support is already merged into Beam (it missed the
2.1 cutoff, so it will be in Beam 2.2.0). AvroIO support is in
https://github.com/apache/beam/pull/3541. This is almost ready to merge -
still waiting for final review from kennknowles on the Beam translation
changes.
Nobody
Robert, in your case where output is being produced based upon a heartbeat,
either the watermark on the output went to infinity and all that data being
produced is droppable at which point the timer becomes droppable or the
output watermark is being held by the scheduling of the next timer and
henc
Hi all,
Was there any progress on this recently? I am particularly interested in
using value-dependent destinations in BigtableIO (writing to a specific
table depending on the value) and AvroIO (writing to specific GCS buckets
depending on the value).
Thanks,
Josh
On Fri, Jun 9, 2017 at 5:35 PM,
I'm split between our current one pass model of pipeline construction and a
two pass model where all information is gathered and then PTransform
expansions are performed.
On Tue, Jul 25, 2017 at 8:25 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:
> Hello,
>
> I've worked on a few di
16 matches
Mail list logo