Beam implements Windowing itself (via state and timers) rather than
deferring to Flink's implementation.
On Wed, Jun 12, 2024 at 11:55 AM Ruben Vargas wrote:
>
> Hello guys
>
> May be a silly question,
>
> But in the Flink runner, the window implementation uses the Flink
> windowing? Does that me
uming resources in some way? I'm assuming may be is not
> significant.
That is correct, but the resources consumed by an idle operator should
be negligible.
> Thanks.
>
> El El vie, 7 de jun de 2024 a la(s) 3:56 p.m., Robert Bradshaw via user
> escribió:
>>
>> Y
You can always limit the parallelism by assigning a single key to
every element and then doing a grouping or reshuffle[1] on that key
before processing the elements. Even if the operator parallelism for
that step is technically, say, eight, your effective parallelism will
be exactly one.
[1]
http
On Fri, Apr 12, 2024 at 1:39 PM Ruben Vargas
wrote:
> On Fri, Apr 12, 2024 at 2:17 PM Jaehyeon Kim wrote:
> >
> > Here is an example from a book that I'm reading now and it may be
> applicable.
> >
> > JAVA - (id.hashCode() & Integer.MAX_VALUE) % 100
> > PYTHON - ord(id[0]) % 100
>
or abs(hash(
Are you draining[1] your pipeline or simply canceling it and starting a new
one? Draining should close open windows and attempt to flush all in-flight
data before shutting down. For PubSub you may also need to read from
subscriptions rather than topics to ensure messages are processed by either
one
it begins
>> to create an error. My speculation is the containers don't recognise each
>> other and get killed by the Flink task manager. I see containers are kept
>> created and killed.
>>
>> Does every multi-language pipeline runs in a separate container?
>>
The Python Local Runner has limited support for streaming pipelines.
For the time being would recommend using Dataflow or Flink (the latter
can be run locally) to try out streaming pipelines.
On Fri, Mar 8, 2024 at 2:11 PM Puertos tavares, Jose J (Canada) via
user wrote:
>
> Hello Hu:
>
>
>
> No
t the flink runner. A flink cluster
> is started locally.
>
> On Thu, 7 Mar 2024 at 12:13, Robert Bradshaw via user
> wrote:
>>
>> Streaming portable pipelines are not yet supported on the Python local
>> runner.
>>
>> On Wed, Mar 6, 2024 at 5:03 PM Jaeh
Streaming portable pipelines are not yet supported on the Python local runner.
On Wed, Mar 6, 2024 at 5:03 PM Jaehyeon Kim wrote:
>
> Hello,
>
> I use the python SDK and my pipeline reads messages from Kafka and transforms
> via SQL. I see two containers are created but it seems that they don't
There is no longer a huge amount of active development going on here,
but implementing a missing function seems like an easy contribution
(lots of examples to follow). Otherwise, definitely worth filing a
feature request as a useful signal for prioritization.
On Mon, Mar 4, 2024 at 4:33 PM Jaehyeo
There is no difference; FlatMapElements is implemented in terms of a
DoFn that invokes context.output multiple times. And, yes, Dataflow
will fuse consecutive operations automatically. So if you have
something like
... -> DoFnA -> DoFnB -> GBK -> DoFnC -> ...
Dataflow will fuse DoFnA and DoFnB to
On Wed, Jan 24, 2024 at 10:48 AM Mark Striebeck
wrote:
>
> If point beam to the local jar, will beam start and also stop the expansion
> service?
Yes it will.
> Thanks
> Mark
>
> On Wed, 24 Jan 2024 at 08:30, Robert Bradshaw via user
> wrote:
>>
>&g
You can also manually designate a replacement jar to be used rather
than fetching the jar from maven, either as a pipeline option or (as
of the next release) as an environment variable. The format is a json
mapping from gradle targets (which is how we identify these jars) to
local files (or urls).
This is probably because you're trying to index into the result of the
GroupByKey in your AnalyzeSession as if it were a list. All that is
promised is that it is an iterable. If it is large enough to merit
splitting over multiple fetches, it won't be a list. (If you need to
index, explicitly conver
Reshuffle is perfectly fine to use if the goal is just to redistribute
work. It's only deprecated as a "checkpointing" mechanism.
On Fri, Jan 19, 2024 at 9:44 AM Danny McCormick via user
wrote:
>
> For runners that support Reshuffle, it should be safe to use. Its been
> "deprecated" for 7 years,
Nothing problematic is standing out for me in those logs. A job
service and artifact staging service is spun up to allow the job (and
its artifacts) to be submitted, then they are shut down. What are the
actual errors that you are seeing?
On Wed, Jan 3, 2024 at 7:39 AM Lydian wrote:
>
>
> Hi,
>
>
And should it be a list of strings, rather than a string?
On Tue, Dec 19, 2023 at 10:10 AM Anand Inguva via user
wrote:
> Can you try passing `extra_packages` instead of `extra_package` when
> passing pipeline options as a dict?
>
> On Tue, Dec 19, 2023 at 12:26 PM Sumit Desai via user <
> user@
Currently error handling is implemented on sinks in an ad-hoc basis
(if at all) but John (cc'd) is looking at improving things here.
On Mon, Dec 4, 2023 at 10:25 AM Juan Romero wrote:
>
> Hi guys. I want to ask you about how to deal with the scenario when the
> target sink (eg: jdbc, kafka, bigq
/org/apache/beam/sdk/Pipeline.java#L630
>
> On Fri, Oct 13, 2023 at 1:32 PM Joey Tran wrote:
>>
>>
>>
>> On Fri, Oct 13, 2023 at 1:18 PM Robert Bradshaw wrote:
>>>
>>> On Fri, Oct 13, 2023 at 10:08 AM Joey Tran
>>> wrote:
>>>&
On Thu, Oct 19, 2023 at 2:00 PM Joey Tran wrote:
>
> For the python SDK, is there somewhere where we document more "advance"
> composite transform operations?
I'm not sure, but
https://beam.apache.org/documentation/programming-guide/ is the
canonical palace information like this should probaby b
something we try to avoid though.
Another option is to make the suffix a uuid rather than a single counter.
(This would still have issues with the first application possibly getting
mixed up with a "different" first application unless it was always
appended.)
> On Fri, Oct 13, 202
nflight messages). At
least with the old, intersecting names we can detect this problem
rather than silently give corrupt data.
On Fri, Oct 13, 2023 at 7:15 AM Joey Tran wrote:
> For posterity: https://github.com/apache/beam/pull/28984
>
> On Tue, Oct 10, 2023 at 7:29 PM Robert Bradshaw
>
va? I imagine that if it's a toggle it'd be
>> a very sticky toggle since it'd be easy for PTransforms to accidentally
>> rely on it.
>>
>> On Thu, Oct 5, 2023 at 12:33 PM Robert Bradshaw
>> wrote:
>>
>>> Huh. This used to be a hard err
] https://play.beam.apache.org/?sdk=python&shared=hIrm7jvCamW
>
> On Wed, Oct 4, 2023 at 12:16 PM Robert Bradshaw via user <
> user@beam.apache.org> wrote:
>
>> BeamJava and BeamPython have the exact same behavior: transform names
>> within must be distinct [1]. This
BeamJava and BeamPython have the exact same behavior: transform names
within must be distinct [1]. This is because we do not necessarily know at
pipeline construction time if the pipeline will be streaming or batch, or
if it will be updated in the future, so the decision was made to impose
this res
Yes, for sure. This is one of the areas Beam excels vs. more simple tools
like SQL. You can write arbitrary code to iterate over arbitrary structures
in the typical Java/Python/Go/Typescript/Scala/[pick your language] way. In
the Beam nomenclature. UDFs correspond to DoFns and UDAFs correspond to
C
>
>
> Is that assumption correct?
>
>
>
> El El vie, 15 de septiembre de 2023 a la(s) 10:59, Robert Bradshaw via
> user escribió:
>
>> Beam will block on side inputs until at least one value is available (or
>> the watermark has advanced such that we can be sur
Beam will block on side inputs until at least one value is available (or
the watermark has advanced such that we can be sure one will never become
available, which doesn't really apply to the global window case).
After that, workers generally cache the side input value (for performance
reasons) but
rDo, extract it's
>> dofn, wrap it, and return a new ParDo
>>
>> On Fri, Sep 15, 2023, 11:53 AM Robert Bradshaw via user <
>> user@beam.apache.org> wrote:
>>
>>> +1 to looking at composite transforms. You could even have a composite
>>> trans
+1 to looking at composite transforms. You could even have a composite
transform that takes another transform as one of its construction arguments
and whose expand method does pre- and post-processing to the inputs/outputs
before/after applying the transform in question. (You could even implement
t
(As an aside, I think all of these options would make for a great blog post
if anyone is interested in authoring one of those...)
On Fri, Sep 1, 2023 at 9:26 AM Robert Bradshaw wrote:
> You can also use Python's RenderRunner, e.g.
>
> python -m apache_beam.examples.wordcount --
You can also use Python's RenderRunner, e.g.
python -m apache_beam.examples.wordcount --output out.txt \
--runner=apache_beam.runners.render.RenderRunner \
--render_output=pipeline.svg
This also has an interactive mode, triggered by passing --port=N (where 0
can be used to pick an unuse
On Thu, Aug 24, 2023 at 12:58 PM Chamikara Jayalath
wrote:
>
>
> On Thu, Aug 24, 2023 at 12:27 PM Robert Bradshaw
> wrote:
>
>> I would like to figure out a way to get the stream-y interface to work,
>> as I think it's more natural overall.
>>
>>
change
> to make and maybe even less typing for the user. I was originally thinking
> side inputs and metrics would happen outside the loop, but I think you want
> a class and not a closure at that point for sanity.
>
> On Thu, Aug 24, 2023 at 12:02 PM Robert Bradshaw
> wrote:
&
but not
>> distributed in the same was as, say, Beam SQL... but it would allow for SQL
>> statements on individual files with projection pushdown supported for
>> things like Parquet which could have some cool and performant data lake
>> applications. I'll probably do a couple of the si
Neat.
Nothing like writing and SDK to actually understand how the FnAPI works :).
I like the use of groupBy. I have to admit I'm a bit mystified by the
syntax for parDo (I don't know swift at all which is probably tripping me
up). The addition of external (cross-language) transforms could let you
interest.
On Fri, Jul 21, 2023 at 7:25 AM Joey Tran wrote:
>
> Could you let me know when you update it? I would be interested in rereading
> after the rewrite.
>
> Thanks!
> Joey
>
> On Fri, Jul 14, 2023 at 4:38 PM Robert Bradshaw wrote:
>>
>> I'm taki
Your SDF looks fine. I wonder if there is an issue with how Flink is
implementing SDFs (e.g. not garbage collecting previous remainders).
On Tue, Jul 18, 2023 at 5:43 PM Nimalan Mahendran
wrote:
>
> Hello,
>
> I am running a pipeline built in the Python SDK that reads from a Redis
> stream via a
he-runner-api-protos> docs
> page which implied to me that they'd be safe to use. I'll check out the
> bundle_processor. Thanks!
>
> On Mon, Jul 10, 2023 at 1:07 PM Robert Bradshaw
> wrote:
>
>> On Sun, Jul 9, 2023 at 9:22 AM Joey Tran
>> wrote:
>
Contributions welcome! I don't think we're at the point we can stop
supporting Pandas 1.x though, so we'd have to do it in such a way as to
support both.
On Wed, Jul 12, 2023 at 4:53 PM XQ Hu via user wrote:
> https://github.com/apache/beam/issues/27221#issuecomment-1603626880
>
> This tracks th
Also portability gives you more flexibility when it
>> comes to choosing an SDK to define the pipeline and will allow you to
>> execute transforms in any SDK via cross-language.
>>
>> Thanks,
>> Cham
>>
>> On Fri, Jun 23, 2023 at 1:57 PM Robert Bradshaw via
ical "SDK
Worker" (which in turn invokes the actual DoFns). This latter may be
inlined (e.g. if it's 100% Python on both sides). See, for example,
https://github.com/apache/beam/blob/v2.48.0/sdks/python/apache_beam/runners/portability/fn_api_runner/worker_handlers.py#L350
> On Fr
ur compute infrastructure. (If
you're not doing streaming, this is much more straightforward than all the
bundler scheduler stuff that currently exists in that code).
>
>
>
>
>
> On Fri, Jun 23, 2023 at 12:17 PM Alexey Romanenko <
> aromanenko@gmail.com> wrote:
>
&
On Fri, Jun 23, 2023, 7:37 AM Alexey Romanenko
wrote:
> If Beam Runner Authoring Guide is rather high-level for you, then, at
> fist, I’d suggest to answer two questions for yourself:
> - Am I going to implement a portable runner or native one?
>
The answer to this should be portable, as non-por
> [1] https://gist.github.com/egalpin/162a04b896dc7be1d0899acf17e676b3
>
> On Thu, May 25, 2023 at 2:25 PM Robert Bradshaw via user <
> user@beam.apache.org> wrote:
>
>> The GbkBeforeStatefulParDo is an implementation detail used to send all
>> elements with the same key to the same
The GbkBeforeStatefulParDo is an implementation detail used to send all
elements with the same key to the same worker (so that they can share
state, which is itself partitioned by worker). This does cause a global
barrier in batch pipelines.
On Thu, May 25, 2023 at 2:15 PM Evan Galpin wrote:
> H
Generally these types of vulnerabilities are only exploitable when
processing untrusted data and/or exposing a public service to the
internet. This is not the typical use of Beam (especially the latter),
but that's not to say Beam can't be used in this way. That being said,
it's preferable to simpl
On Fri, Apr 21, 2023 at 3:37 AM Pavel Solomin wrote:
>
> Thank you for the information.
>
> I'm assuming you had a unique ID in records, and you observed some IDs
> missing in Beam output comparing with Spark, and not just some duplicates
> produced by Spark.
>
> If so, I would suggest to create
You are correct in that the data may arrive in an unordered way.
However, once a window finishes, you are guaranteed to have seen all
the data up to that point (modulo late data) and can then confidently
compute your ordered cumulative sum.
You could do something like this:
def cumulative_sums(ke
Docker is not necessary to expand the transform (indeed, by default it
should just pull the Jar and invokes that directly to start the expansion
service), but it is used as the environment in which to execute the
expanded transform.
It would be in theory possible to run the worker without docker a
;>>> On Mon, Apr 17, 2023 at 8:08 AM Reuven Lax wrote:
>>>>>
>>>>>> Are you running on the Dataflow runner? If so, Dataflow - unlike
>>>>>> Spark and Flink - dynamically modifies the parallelism as the operator
>>>>>> runs, so there
What are you trying to achieve by setting the parallelism?
On Sat, Apr 15, 2023 at 5:13 PM Jeff Zhang wrote:
> Thanks Reuven, what I mean is to set the parallelism in operator level.
> And the input size of the operator is unknown at compiling stage if it is
> not a source
> operator,
>
> Here'
That is correct.
On Tue, Apr 11, 2023 at 5:44 AM Hans Hartmann wrote:
>
> Hello,
>
> i'm wondering if Apache Beam is using the message guarantees of the
> execution engines, that the pipeline is running on.
>
> So if i use the SparkRunner the consistency guarantees are exactly-once?
>
> Have a g
On Mon, Mar 13, 2023 at 11:33 AM Godefroy Clair
wrote:
> Hi,
> I am wondering about the way `Flatten()` and `FlatMap()` are implemented
> in Apache Beam Python.
> In most functional languages, FlatMap() is the same as composing
> `Flatten()` and `Map()` as indicated by the name, so Flatten() and
Whenever state is used, the runner will arrange such that the same
keys will all go to the same worker, which often involves injecting a
shuffle-like operation if the keys are spread out among many workers
in the input. (An alternative implementation could involve storing the
state in a distributed
Seams reasonable to me.
On Tue, Feb 7, 2023 at 4:19 PM Luke Cwik via user wrote:
>
> As per [1], the JDK8 and JDK11 containers that Apache Beam uses have stopped
> being built and supported since July 2022. I have filed [2] to track the
> resolution of this issue.
>
> Based upon [1], almost eve
You should be able to omit the environment_type and environment_config
variables and they will be populated automatically. For running
locally, the flink_master parameter is not needed either (one will be
started up automatically).
On Fri, Feb 3, 2023 at 12:51 PM Talat Uyarer via user
wrote:
>
>
I'm also not sure it's part of the contract that the containerization
technology we use will always have these capabilities.
On Mon, Jan 30, 2023 at 10:53 AM Chad Dombrova wrote:
>
> Hi Valentyn,
>
>>
>> Beam SDK docker containers on Dataflow VMs are currently launched in
>> privileged mode.
>
>
Different idea: is it possible to serve this data via another protocol
(e.g. sftp) rather than requiring a mount?
On Mon, Jan 30, 2023 at 9:26 AM Chad Dombrova wrote:
>
> Hi Robert,
> I know very little about the FileSystem classes, but I don’t think it’s
> possible for a process running in dock
If it's your input/output data, presumably you could implement a
https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/FileSystem.html
for nfs. (I don't know what all that would entail...)
On Mon, Jan 30, 2023 at 9:04 AM Chad Dombrova wrote:
>
> Hi Israel,
> Thanks for responding.
bc
>
> It seemed to run fine both on DirectRunner and PortableRunner (embed mode),
> but Dataflow v2 runner raised an error at runtime seemingly associated with
> the Shuffle service? I have job IDs and trace links if those are helpful as
> well.
>
> Thanks,
> Eva
Awesome, thanks!
On Mon, Jun 14, 2021 at 5:36 PM Evan Galpin wrote:
>
> I’ll try to create something as small as possible from the pipeline I
> mentioned 👍 I should have time this week to do so.
>
> Thanks,
> Evan
>
> On Mon, Jun 14, 2021 at 18:09 Robert Bradshaw wro
ink
>
> It doesn't seem to matter if there are 0 messages in a subscription or 50k
> messages at startup. The rate of new messages however is very low. Not sure
> if those are helpful details, let me know if there's anything else specific
> which would help.
>
> On M
+1, we'd really like to get to the bottom of this, so clear
instructions on a pipeline/conditions that can reproduce it would be
great.
On Mon, Jun 14, 2021 at 7:34 AM Jan Lukavský wrote:
>
> Hi Eddy,
>
> you are probably hitting a not-yet discovered bug in SDF implementation in
> FlinkRunner th
Glad you were able to figure it out.
Maybe it's moot with runner v2 becoming the default, but we really
should give a clearer error in this case.
On Wed, Jun 2, 2021 at 8:16 PM Chamikara Jayalath wrote:
>
> Great :)
>
> On Wed, Jun 2, 2021 at 8:15 PM Alex Koay wrote:
>>
>> Finally figured out t
On Wed, Jun 2, 2021 at 11:18 AM Vincent Marquez
wrote:
>
> On Wed, Jun 2, 2021 at 11:11 AM Robert Bradshaw wrote:
>>
>> If you want to control the total number of elements being processed
>> across all workers at a time, you can do this by assigning random keys
>
pipelines (google
> does not allow more than 25 dataflow pipelines per region) with 10 elements
> each, I am launching the next 20 pipelines.
>
> This is ofcourse missing the benefit of serverless.
>
> Any idea, how to work around this?
>
> Best,
> Eila
>
>
>
Note that workers generally process one element per thread at a time. The
number of threads defaults to the number of cores of the VM that you're
using.
On Mon, May 17, 2021 at 10:18 AM Brian Hulette wrote:
> What type of files are you reading? If they can be split and read by
> multiple workers
y1" that are sorted by "key2". The
>>> downstreaming process, for example, will make a rolling window with size N
>>> that reads N records together at one time. But note, the rolling window
>>> will not cross different "key1".
>>>
>>
Sorry, no Java versions of this stuff (though it may be possible to
use cross-language to invoke your Java pipeline from Python and get
the benefits that way).
On Fri, Apr 30, 2021 at 11:30 AM Tao Li wrote:
>
> Thanks @Ning Kang.
>
> @Robert Bradshaw I assume you are referring
You can also use interactive Beam's collect, to get the PCollection as
a Dataframe, and then print it or do whatever else with it as you
like.
On Fri, Apr 30, 2021 at 10:24 AM Ning Kang wrote:
>
> Hi Tao,
>
> The `show()` API works with any IPython notebook runtimes, including Colab,
> Jupyter L
u, Robert and Brian.
>>>>>
>>>>> I'd like to try this out. I am trying to distribute my dataset to
>>>>> nodes, sort each partition by some key and then store each partition to
>>>>> its
>>>>> own file.
>>>
Thanks for trying this out.
Better support for groupby (e.g. https://github.com/apache/beam/pull/13843
, https://github.com/apache/beam/pull/13637) will be available in the next
Beam release (2.29, in progress, but you could try out head if you want).
Note, however, that Beam PCollections are by d
On Wed, Mar 24, 2021 at 4:24 AM Đức Trần Tiến
wrote:
>
> And the last question: Could I write that pipeline in Java and invoke that
> pipeline from Go? :D
>
That is exactly the story we're trying to pursue for getting the large set
of Java connectors available to Go:
https://cloud.google.com/bl
>> Am I reading this wrong?
>>
>> Kenn
>>
>> On Wed, Mar 24, 2021 at 4:35 PM Alex Amato wrote:
>>
>>> How about a PCollection containing every element which was successfully
>>> written?
>>> Basically the same things which were passed into it
nd just add new
> funtionality. Though, we need to follow the same pattern for user API and
> maybe even naming for this feature across different IOs (like we have for
> "readAll()” methods).
> >
> > I agree that we have to avoid returning PDone for such cases.
> >
Returning PDone is an anti-pattern that should be avoided, but changing it
now would be backwards incompatible. PRs to add non-PDone returning
variants (probably as another option to the builders) that compose well
with Wait, etc. would be welcome.
On Wed, Mar 24, 2021 at 11:14 AM Alexey Romanenko
Do we now support 1.8 through 1.12?
Unless there are specific objections, makes sense to me.
On Fri, Mar 12, 2021 at 8:29 AM Alexey Romanenko
wrote:
> +1 too but are there any potential objections for this?
>
> On 12 Mar 2021, at 11:21, David Morávek wrote:
>
> +1
>
> D.
>
> On Thu, Mar 11, 20
Fortunately making deleting files idempotent is much easier than writing
them :). But one needs to handle the case of concurrent execution as well
as sequential re-execution due to possible zombie workers.
On Wed, Jan 27, 2021 at 5:04 PM Reuven Lax wrote:
> Keep in mind thatt DoFns might be reex
t of all the different array
> elements?
>
> On Thu, Jan 14, 2021 at 11:25 AM Robert Bradshaw
> wrote:
>
>> I think it makes sense to allow specifying more than one, if desired.
>> This is equivalent to just stacking multiple Unnests. (Possibly one could
>> even hav
ly could be a top-level transform. Should it automatically
>>>>> unnest all arrays, or just the fields specified?
>>>>>
>>>>> We do have to define the semantics for nested arrays as well.
>>>>>
>>>>> On Wed, Jan 13, 2021 at 1
Ah, thanks for the clarification. UNNEST does sound like what you want
here, and would likely make sense as a top-level relational transform as
well as being supported by SQL.
On Wed, Jan 13, 2021 at 10:53 AM Tao Li wrote:
> @Kyle Weaver sure thing! So the input/output
> definition for the Flat
I agree. Borrowing the mutation detection from the direct runner as an
intermediate point sounds like a good idea.
On Mon, Dec 21, 2020 at 8:57 AM Kenneth Knowles wrote:
> I really think we should make a plan to make this the default. If you test
> with the DirectRunner it will do mutation check
Yes, it should be for batch (just like for Python). There is ongoing
work to make it work for Streaming as well.
On Sat, Nov 21, 2020 at 2:57 PM Meriem Sara wrote:
>
> Hello everyone. I am trying to use apache beam with Golang to execute a data
> processing workflow using apache Spark. However,
Support for Flink 1.11 is
https://issues.apache.org/jira/browse/BEAM-10612 . It has been
implemented and will be included in the next release (Beam 2.25). In
the meantime, you could try building yourself from head.
On Fri, Oct 16, 2020 at 4:39 AM Kishor Joshi wrote:
>
> Hi Team,
>
> Since the bea
:177)
> at
> org.apache.beam.runners.fnexecution.control.FnApiControlClient$ResponseStreamObserver.onNext(FnApiControlClient.java:157)
>
> On Fri, Oct 2, 2020 at 5:14 PM Robert Bradshaw wrote:
>>
>> Could you clarify a bit exactly what you're trying to
> should
> deprecate a v1 IO ONLY when we have full feature parity in the v2 version.
> I think we don't have a replacement for AWSv1 S3 IO so that one should not
> be
> deprecated.
>
> On Tue, Sep 15, 2020 at 6:07 PM Robert Bradshaw
> wrote:
> >
> > The
Try priv...@beam.apache.org.
On Tue, Aug 25, 2020 at 6:18 AM D, Anup (Nokia - IN/Bangalore)
wrote:
>
> Hi,
>
>
>
> We would like to know if there is a way to reach out to members of the pmc
> group.
>
> We tried sending email to p...@beam.apache.org but it got bounced.
>
>
>
> Thanks
>
> Anup
As for the question of writing tests in the face of non-determinism,
you should look into TestStream. MyStatefulDoFn still needs to be
updated to not assume an ordering. (This can be done by setting timers
that provide guarantees that (modulo late data) one has seen all data
up to a certain timesta
Is this 2kps coming out of Filter1 + 2kps coming out of Filter2 (which
would be 4kps total), or only 2kps coming out of KafkaIO and
MessageExtractor?
Though it /shouldn't/ matter, due to sibling fusion, there's a chance
things are getting fused poorly and you could write Filter1 and
Filter2 instea
I checked Java, it looks like the way things are structured we do not
have that bug there.
On Mon, Aug 17, 2020 at 3:31 PM Robert Bradshaw wrote:
>
> +1
>
> Thanks, Eugene, for finding and fixing this!
>
> FWIW, most use of Python from the Python Portable Runner used the
>
+1
Thanks, Eugene, for finding and fixing this!
FWIW, most use of Python from the Python Portable Runner used the
embedded environment (this is the default direct runner), so
dependencies are already present.
On Mon, Aug 17, 2020 at 3:19 PM Daniel Oliveira wrote:
>
> Normally I'd say not to che
Thanks for the update!
On Wed, Aug 5, 2020 at 11:46 AM Neville Li wrote:
>
> Hi all,
>
> We just released Scio 0.9.3. This bumps Beam SDK to 2.23.0 and includes a lot
> of improvements & bug fixes.
>
> Cheers,
> Neville
>
> https://github.com/spotify/scio/releases/tag/v0.9.3
>
> "Petrificus Tota
On Sat, Jul 18, 2020 at 12:08 PM Chamikara Jayalath
wrote:
>
>
> On Fri, Jul 17, 2020 at 10:04 PM ayush sharma <1705ay...@gmail.com> wrote:
>
>> Thank you guys for the reply. I am really stuck and could not proceed
>> further.
>> Yes, the previous trial published message had null key.
>> But when
ditionally, what is the best way to test writing to BigQuery?
> I have seen this file
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/cookbook/bigtableio_it_test.py
> but it appears it writes to real big query?
>
> kind regards
> Marco
>
>
&g
test_pipeline.get_full_options_as_args(**extra_opts))
>
> print(result)
>
> Basically, i would expect a PCollection as result of the pipeline, and i
> would be testing the content of the PCollection
>
> Running this results in this messsage
>
> IT is skipped b
You can use apache_beam.testing.util.assert_that to write tests of
Beam pipelines. This is what Beam uses for its tests, e.g.
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util_test.py#L80
On Mon, Jul 13, 2020 at 2:36 PM Sofia’s World wrote:
>
> Hi all
> i was wo
Does it work when you write to a distributed filesystem? (One issue
with Docker is that the manager and each of their workers have their
own local filesystem.)
On Tue, Jul 7, 2020 at 2:17 PM Avijit Saha wrote:
>
> While trying to run the Beam WordCount example on Flink runner using Job
> Manage
On Tue, Jun 30, 2020 at 3:32 PM Julien Phalip wrote:
> Thanks Luke!
>
> One part I'm still a bit unclear about is how exactly the PreCombine stage
> works. In particular, I'm wondering how it can perform the combination
> before the GBK. Is it because it can already compute the combination on
> a
To clarify, PaneInfo is supported on the FnAPI local runner, but not on the
bundle based one. Unfortunately, Streaming is not supported on the FnAPI
one (yet), but work there is ongoing.
On Tue, May 26, 2020 at 11:46 AM Pablo Estrada wrote:
> Hi Jayadeep,
> Unfortunately, it seems that PaneInfo
1 - 100 of 262 matches
Mail list logo