Re: Java/Python/Proto mismatch: MergeStatus.ALREADY_MERGED vs InvalidWindows

2021-02-18 Thread Kenneth Knowles
So there's a bit of an open question about the Java SDK behavior and whether we should keep the unused ALREADY_MERGED in the model proto. Here is a proposal that maintains the intent of everything: - Remove MergeStatus.ALREADY_MERGED since there is no SDK that has ever had any semantics like tha

Re: Java/Python/Proto mismatch: MergeStatus.ALREADY_MERGED vs InvalidWindows

2021-02-18 Thread Robert Bradshaw
I think you're right about Python. I think it's fine for the SDK to prohibit (or require explicit user action) for ambiguous things like stacked sessions. This illegal state wouldn't generally need to be represented in proto (but maybe it'd be nice for quicker errors in cross language). On Thu, Fe

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-18 Thread Robert Bradshaw
On Wed, Feb 17, 2021 at 1:56 PM Kenneth Knowles wrote: > > On Wed, Feb 17, 2021 at 1:06 PM Robert Bradshaw > wrote: > >> I would prefer to leave downstream triggering up to the runner (or, >> better, leave upstream triggering up to the runner, a la sink triggers), >> but one problem is that with

Re: [VOTE] Release 2.28.0, release candidate #1

2021-02-18 Thread Brian Hulette
+1 (non-binding) Built and ran a streaming SQL in Python pipeline and a DataFrame API pipeline on Dataflow On Thu, Feb 18, 2021 at 1:00 PM Pablo Estrada wrote: > +1 (binding) > > Built and ran basic tests for existing Dataflow templates > > On Wed, Feb 17, 2021 at 8:06 PM Chamikara Jayalath >

Re: Java/Python/Proto mismatch: MergeStatus.ALREADY_MERGED vs InvalidWindows

2021-02-18 Thread Kenneth Knowles
Great. Should be easy to sort this out before Go has to make any decisions. I will take this opportunity to get on my soapbox and suggest instead of "custom WindowFn" we simply call them "WindowFn". The suffix "Fn" indicates that it is definable code, not just an enum that selects baked-in functio

Re: BEAM-6855

2021-02-18 Thread Brian Hulette
I added JvmInitializer [1] to do some one-time initialization per JVM before processing starts. It might be useful here... the intended use-case was to perform quick configuration functions, but I suppose you could use it to pull some data that you can reference later. [1] https://beam.apache.org/

Re: BEAM-6855

2021-02-18 Thread Pablo Estrada
+Brian Hulette I believe you worked on a way to load data on worker startup? On Thu, Feb 18, 2021 at 1:00 PM Daniel Collins wrote: > The getState function should be static, sorry. "synchronized static > @NotNull MyState getState()" > > On Thu, Feb 18, 2021 at 3:41 PM Daniel Collins > wrote: >

Re: BEAM-6855

2021-02-18 Thread Daniel Collins
The getState function should be static, sorry. "synchronized static @NotNull MyState getState()" On Thu, Feb 18, 2021 at 3:41 PM Daniel Collins wrote: > > On every dataflow start, I want to read from CloudSQL and build the cache > > If you do this outside of dataflow, you can use a static to do

Re: BEAM-6855

2021-02-18 Thread Daniel Collins
> On every dataflow start, I want to read from CloudSQL and build the cache If you do this outside of dataflow, you can use a static to do this on every worker start. Is that what you're looking for? For example: final class StateLoader { private StateLoader() {} @GuardedBy("this") private

Re: [VOTE] Release 2.28.0, release candidate #1

2021-02-18 Thread Pablo Estrada
+1 (binding) Built and ran basic tests for existing Dataflow templates On Wed, Feb 17, 2021 at 8:06 PM Chamikara Jayalath wrote: > Repositories were merged into [1]. So this RC validation thread can > continue. > > Thanks, > Cham > > [1] https://repository.apache.org/content/repositories/orgapa

Re: FileIO.Write fails silently

2021-02-18 Thread Reuven Lax
Do you have checkpointing enabled in your Flink cluster? On Thu, Feb 18, 2021 at 11:50 AM Tapan Upadhyay wrote: > Hi, > > I am currently working on a Beam pipeline (Flink runner) where we read > JSON events from Kafka and write the output in parquet format to S3. > > We write to S3 after every 1

BEAM-6855

2021-02-18 Thread Hemali Sutaria
Hi, I have one question. This is *kind of a blocker for our upcoming release*. It would be great if you could reply at your earliest convenience. My dataflow pipeline is stateful. I am using Beam SDK for stateful processing (StateId, ValueState). I have also implemented OnTimer for my stateful tr

FileIO.Write fails silently

2021-02-18 Thread Tapan Upadhyay
Hi, I am currently working on a Beam pipeline (Flink runner) where we read JSON events from Kafka and write the output in parquet format to S3. We write to S3 after every 10 min. We have observed that our pipeline sometimes stops writing to S3 after restarts (even for a non breaking minor code c

Re: Java/Python/Proto mismatch: MergeStatus.ALREADY_MERGED vs InvalidWindows

2021-02-18 Thread Robert Burke
A bit more information: graphx/translate.go has the handling of WindowingStrategy at pipeline encoding and we only use Non Merging. Presumably this is something that would need to be fixed when supporting Session windows in BEAM-4152 On Thu, Feb 18, 2021, 10:02 AM Robert Burke wrote: > Go has

Re: Java/Python/Proto mismatch: MergeStatus.ALREADY_MERGED vs InvalidWindows

2021-02-18 Thread Robert Burke
Go has very basic windowing support that is managed entirely by the runner. Session windowing isn't implemented yet, let alone custom windowfns which i asume is what would need to specify these things. Session windowing is tracked in BEAM-4152 and Custome windowFns are tracked in BEAM-11100. On W

Re: Do we need synchronized processing time? / What to do about "continuation triggers"?

2021-02-18 Thread Robert Burke
Go SDK note: User configurable available triggers are not yet implemented in the Go SDK. BEAM-3304 tracks this. The only use of triggers in the SDK IIRC is to implement Reshuffle. If/when we do change what an SDK is supposed to do please be sure to provide and update relevant SDK language agnosti