Re: Unportable Dataflow Pipeline Questions

2020-03-31 Thread Robert Burke
+1 to translation from beam pipeline Protos. The Go SDK does that currently in dataflowlib/translate.go to handle the current Dataflow situation, so it's certainly doable. On Tue, Mar 31, 2020, 5:48 PM Robert Bradshaw wrote: > On Tue, Mar 31, 2020 at 12:06 PM Sam Rohde wrote: > >> Hi All, >>

Re: Unportable Dataflow Pipeline Questions

2020-03-31 Thread Robert Bradshaw
On Tue, Mar 31, 2020 at 12:06 PM Sam Rohde wrote: > Hi All, > > I am currently investigating making the Python DataflowRunner to use a > portable pipeline representation so that we can eventually get rid of the > Pipeline(runner) weirdness. > > In that case, I have a lot questions about the Pytho

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Robert Bradshaw
On Tue, Mar 31, 2020 at 4:13 PM Luke Cwik wrote: > > It is important that composites know how things are named so that any > embedded payloads in the composite PTransform can reference the outputs > appropriately. Very good point. This is part of the cleanup to treat inputs and outputs of PColl

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Luke Cwik
It is important that composites know how things are named so that any embedded payloads in the composite PTransform can reference the outputs appropriately. On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw wrote: > On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde wrote: > >>> > >>> * Don't allow arbitr

Re: Default WindowFn for Unbounded source

2020-03-31 Thread amit kumar
Thanks Ankur for your reply. By default the allowed lateness for a global window is zero but we can also set it to be non-zero which will be used in the downstream transforms where group by or window into with trigger is happening ? (using allowedTimeStampSkew for unbounded sources/ sources whic

Re: Default WindowFn for Unbounded source

2020-03-31 Thread amit kumar
Thanks Ankur for your reply. By default the allowed lateness for a global window is zero but we can also set it to be non-zero which will be used in the downstream transforms where group by or window into with trigger is happening ? (using allowedTimeStampSkew for unbounded sources/ sources whic

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Robert Bradshaw
On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde wrote: >>> >>> * Don't allow arbitrary nestings returned during expansion, force composite >>> transforms to always provide an unambiguous name (either a tuple with >>> PCollections with unique tags or a dictionary with untagged PCollections or >>> a si

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Robert Bradshaw
On Tue, Mar 24, 2020 at 1:07 PM Sam Rohde wrote: > > Hi All, > > Problem > I would like to discuss BEAM-9322 and the correct way to set the output tags > of a transform with nested PCollections, e.g. a dict of PCollections, a tuple > of dicts of PCollections. Before the fixing of BEAM-1833, the

Re: [BEAM-9322] Python SDK discussion on correct output tag names

2020-03-31 Thread Sam Rohde
> > * Don't allow arbitrary nestings returned during expansion, force >> composite transforms to always provide an unambiguous name (either a tuple >> with PCollections with unique tags or a dictionary with untagged >> PCollections or a singular PCollection (Java and Go SDKs do this)). >> > > I bel

Unportable Dataflow Pipeline Questions

2020-03-31 Thread Sam Rohde
Hi All, I am currently investigating making the Python DataflowRunner to use a portable pipeline representation so that we can eventually get rid of the Pipeline(runner) weirdness. In that case, I have a lot questions about the Python DataflowRunner: *PValueCache* - Why does this exist? *Da

Re: Default WindowFn for Unbounded source

2020-03-31 Thread Ankur Goenka
Hi Amit, As you don't have any GroupByKey or trigger in your pipeline, you don't need to do allowed lateness. For unbounded source, Global window will never fire a trigger or emit GroupByKey. In the code you linked, a trigger is used which uses allowedLateness. Thanks, Ankur On Tue, Mar 31, 2020

Re: Default WindowFn for Unbounded source

2020-03-31 Thread amit kumar
Thanks Jan! I have a question based on this on Global Window and allowed lateness, with default trigger for the following scenarios: Case 1- TextIO.Read. |. Bounded source |. Global Window |. -infinity watermark apply WithTimeStamps (Based on a timestamp attribute in file) |.

Re: Implementing type hints on multi-output PTransforms

2020-03-31 Thread Joshua B. Harrison
Ok - that makes sense. My specific workaround was to remove the with_output_types for now, so advising the user on this in the error message would be nice. I was just worried about silently passing. As for the formalization: 1. I am a little confused on how this is different than passing multi

Re: Implementing type hints on multi-output PTransforms

2020-03-31 Thread Udi Meiri
Hi Joshua, I've been working on type hints recently. Your issue is similar to this: https://issues.apache.org/jira/browse/BEAM-8782 (exactly the same if tags are passed to with_outputs() in the example). There's also this related bug about type inference: https://issues.apache.org/jira/browse/BEAM-

Re: Implementing type hints on multi-output PTransforms

2020-03-31 Thread Luke Cwik
I can see that argument but what does a user need to do in this case if we raise NotImplementedError? Would the need to disable type checking everywhere? Over the long term users will need to deal with improvements to type checking and will need to fix typing errors when they change Apache Beam ve

Re: Implementing type hints on multi-output PTransforms

2020-03-31 Thread Joshua B. Harrison
The current code errors out with a cryptic message around tag types in the multi-output. Adding a NotImplementedError was just an attempt to make the failure reason more clear. I would be worried about trivially passing because then the user might think they have type checking safety when they don

Re: Implementing type hints on multi-output PTransforms

2020-03-31 Thread Luke Cwik
Would the NotImplementedError cause users pipeline errors or is that a signal to the type checking mechanism to ignore it? If this would cause failures I would rather make the unsupported case return something that would be trivially true. On Mon, Mar 30, 2020 at 12:01 PM Joshua B. Harrison wrote

Re: Question about updating documents on MongoDB

2020-03-31 Thread Jean-Baptiste Onofre
Hi, Not sure I understand fully your use case, but you can add/update document with Write: pipeline .apply(...) .apply(MongoDbIO.write() .withUri("mongodb://localhost:27017") .withDatabase("my-database") .withCollection("my-collection") .withNumSplits(30)) It’s using

Question about updating documents on MongoDB

2020-03-31 Thread Emilien HENON
Hello, I would like to contact you concerning a question about the MongoDBIO connector. As part of our Dataflow / Apache Beam project, we need to create and update documents in our MongoDB database. However, the update operation seems impossible with the current connector (either we read or we cre

Re: A new reworked Elasticsearch 7+ IO module

2020-03-31 Thread Etienne Chauchot
Hi all, The survey regarding Elasticsearch support in Beam is now closed. Here are the results after 38 days: users using ESv2: 0 ESV5: 1 ESV6: 5 ESV7: 8 So, the new version of ElasticsearchIO after the refactoring discussed in this thread will no more support Elasticsearch v2. Regards

Re: Default WindowFn for Unbounded source

2020-03-31 Thread Jan Lukavský
Hi Amit, the window function applied by default is WindowingStrategy.globalDefault(), [1] - global window with zero allowed lateness. Cheers,  Jan [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/WindowingStrategy.java#L105 On 3/31/20

Re: [VOTE + INPUT] Beam Mascot Designs, 3rd iteration - Deadline Wednesday, April 1

2020-03-31 Thread Jan Lukavský
On 3/31/20 4:06 AM, Julian Bruno wrote: Hello Apache Beam Community, We need a third input from the community to finish the design. Please share your input no later than Wednesday, April 1st, at noon Pacific Time. Below you will find a link to the presentation of the work process and we are

Re: [VOTE + INPUT] Beam Mascot Designs, 3rd iteration - Deadline Wednesday, April 1

2020-03-31 Thread Maximilian Michels
Hi Julian! Perfect, thanks for incorporating all the suggestions. > 1. Do you prefer stripes or no stripes? No stripes. Cheers, Max On 31.03.20 08:11, Alex Van Boxel wrote: > Nooo stripes > >  _/ > _/ Alex Van Boxel > > > On Tue, Mar 31, 2020 at 6:06 AM Joshua B. Harrison > mailto:josh.harr

Default WindowFn for Unbounded source

2020-03-31 Thread amit kumar
Hi All, Is there a default WindowFn that gets applied to elements of an unbounded source. For example, if I have a Kinesis input source ,for which all elements are timestamped with ArrivalTime, what will be the default windowing applied to the output of read transform ? Is this runner dependent