+1 to translation from beam pipeline Protos.
The Go SDK does that currently in dataflowlib/translate.go to handle the
current Dataflow situation, so it's certainly doable.
On Tue, Mar 31, 2020, 5:48 PM Robert Bradshaw wrote:
> On Tue, Mar 31, 2020 at 12:06 PM Sam Rohde wrote:
>
>> Hi All,
>>
On Tue, Mar 31, 2020 at 12:06 PM Sam Rohde wrote:
> Hi All,
>
> I am currently investigating making the Python DataflowRunner to use a
> portable pipeline representation so that we can eventually get rid of the
> Pipeline(runner) weirdness.
>
> In that case, I have a lot questions about the Pytho
On Tue, Mar 31, 2020 at 4:13 PM Luke Cwik wrote:
>
> It is important that composites know how things are named so that any
> embedded payloads in the composite PTransform can reference the outputs
> appropriately.
Very good point. This is part of the cleanup to treat inputs and
outputs of PColl
It is important that composites know how things are named so that any
embedded payloads in the composite PTransform can reference the outputs
appropriately.
On Tue, Mar 31, 2020 at 2:51 PM Robert Bradshaw wrote:
> On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde wrote:
> >>>
> >>> * Don't allow arbitr
Thanks Ankur for your reply.
By default the allowed lateness for a global window is zero but we can also
set it to be non-zero which will be used in the downstream transforms
where group by or window into with trigger is happening ?
(using allowedTimeStampSkew for unbounded sources/ sources whic
Thanks Ankur for your reply.
By default the allowed lateness for a global window is zero but we can also
set it to be non-zero which will be used in the downstream transforms
where group by or window into with trigger is happening ?
(using allowedTimeStampSkew for unbounded sources/ sources whic
On Tue, Mar 31, 2020 at 1:13 PM Sam Rohde wrote:
>>>
>>> * Don't allow arbitrary nestings returned during expansion, force composite
>>> transforms to always provide an unambiguous name (either a tuple with
>>> PCollections with unique tags or a dictionary with untagged PCollections or
>>> a si
On Tue, Mar 24, 2020 at 1:07 PM Sam Rohde wrote:
>
> Hi All,
>
> Problem
> I would like to discuss BEAM-9322 and the correct way to set the output tags
> of a transform with nested PCollections, e.g. a dict of PCollections, a tuple
> of dicts of PCollections. Before the fixing of BEAM-1833, the
>
> * Don't allow arbitrary nestings returned during expansion, force
>> composite transforms to always provide an unambiguous name (either a tuple
>> with PCollections with unique tags or a dictionary with untagged
>> PCollections or a singular PCollection (Java and Go SDKs do this)).
>>
>
> I bel
Hi All,
I am currently investigating making the Python DataflowRunner to use a
portable pipeline representation so that we can eventually get rid of the
Pipeline(runner) weirdness.
In that case, I have a lot questions about the Python DataflowRunner:
*PValueCache*
- Why does this exist?
*Da
Hi Amit,
As you don't have any GroupByKey or trigger in your pipeline, you don't
need to do allowed lateness.
For unbounded source, Global window will never fire a trigger or emit
GroupByKey.
In the code you linked, a trigger is used which uses allowedLateness.
Thanks,
Ankur
On Tue, Mar 31, 2020
Thanks Jan!
I have a question based on this on Global Window and allowed lateness, with
default trigger for the following
scenarios:
Case 1-
TextIO.Read.
|. Bounded source
|. Global Window
|. -infinity watermark
apply
WithTimeStamps (Based on a timestamp attribute in file)
|.
Ok - that makes sense. My specific workaround was to remove the
with_output_types for now, so advising the user on this in the error
message would be nice. I was just worried about silently passing.
As for the formalization:
1. I am a little confused on how this is different than passing multi
Hi Joshua,
I've been working on type hints recently.
Your issue is similar to this:
https://issues.apache.org/jira/browse/BEAM-8782 (exactly the same if tags
are passed to with_outputs() in the example).
There's also this related bug about type inference:
https://issues.apache.org/jira/browse/BEAM-
I can see that argument but what does a user need to do in this case if we
raise NotImplementedError? Would the need to disable type checking
everywhere?
Over the long term users will need to deal with improvements to type
checking and will need to fix typing errors when they change Apache Beam
ve
The current code errors out with a cryptic message around tag types in the
multi-output. Adding a NotImplementedError was just an attempt to make the
failure reason more clear.
I would be worried about trivially passing because then the user might
think they have type checking safety when they don
Would the NotImplementedError cause users pipeline errors or is that a
signal to the type checking mechanism to ignore it?
If this would cause failures I would rather make the unsupported case
return something that would be trivially true.
On Mon, Mar 30, 2020 at 12:01 PM Joshua B. Harrison
wrote
Hi,
Not sure I understand fully your use case, but you can add/update document with
Write:
pipeline
.apply(...)
.apply(MongoDbIO.write()
.withUri("mongodb://localhost:27017")
.withDatabase("my-database")
.withCollection("my-collection")
.withNumSplits(30))
It’s using
Hello,
I would like to contact you concerning a question about the MongoDBIO
connector. As part of our Dataflow / Apache Beam project, we need to create
and update documents in our MongoDB database. However, the update operation
seems impossible with the current connector (either we read or we cre
Hi all,
The survey regarding Elasticsearch support in Beam is now closed.
Here are the results after 38 days:
users using
ESv2: 0
ESV5: 1
ESV6: 5
ESV7: 8
So, the new version of ElasticsearchIO after the refactoring discussed
in this thread will no more support Elasticsearch v2.
Regards
Hi Amit,
the window function applied by default is
WindowingStrategy.globalDefault(), [1] - global window with zero allowed
lateness.
Cheers,
Jan
[1]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/WindowingStrategy.java#L105
On 3/31/20
On 3/31/20 4:06 AM, Julian Bruno wrote:
Hello Apache Beam Community,
We need a third input from the community to finish the design. Please
share your input no later than Wednesday, April 1st, at noon Pacific
Time. Below you will find a link to the presentation of the work
process and we are
Hi Julian!
Perfect, thanks for incorporating all the suggestions.
> 1. Do you prefer stripes or no stripes?
No stripes.
Cheers,
Max
On 31.03.20 08:11, Alex Van Boxel wrote:
> Nooo stripes
>
> _/
> _/ Alex Van Boxel
>
>
> On Tue, Mar 31, 2020 at 6:06 AM Joshua B. Harrison
> mailto:josh.harr
Hi All,
Is there a default WindowFn that gets applied to elements of an unbounded
source.
For example, if I have a Kinesis input source ,for which all elements are
timestamped with ArrivalTime, what will be the default windowing applied to
the output of read transform ?
Is this runner dependent
24 matches
Mail list logo