Reasons we use Java serialization are not fundamental, probably only
historical. Thinking about it, yes, there is lucky coincidence that we
currently have to change the serialization because of Flink 1.17
support. Flink 1.17 actually removes the legacy java serialization from
Flink and
On 2/21/24 18:27, Reuven Lax via dev wrote:
Agreed, that event-time throttling doesn't make sense here. In theory
processing-time timers have no SLA - i.e. their firing might be
delayed - so batch runners aren't violating the model by firing them
all at the end; however it does make
Yea I think we should restore the necessary classes but also fix the
FlinkRunner. Java serialization is inherently self-update-incompatible.
On Wed, Feb 21, 2024 at 1:35 PM Reuven Lax via dev
wrote:
> Is there a fundamental reason we serialize java classes into Flink
> savepoints.
>
> On Wed,
Anyway, to clarify my concern above about cumulative errors. In the past
I've seen this cause errors that were close to an order of magnitude. e.g.
the user specifies a rate limit of 100K QPS, and the throttler ends up
throttling to 15K QPS. The throttlers ended up being close to useless, as
it
I'm wondering if we could build a global rate limiter in the transform,
using side inputs to communicate tokens to all the throttling shards.
However this might get more complicated, and we risk causing performance
problems if this creates graph cycles.
On Wed, Feb 21, 2024 at 10:51 AM Robert
I agree that a global rate limiter would be ideal, but either we make all
runners implement one as part of Beam (and the requisite SDK side hooks) or
we're forcing users to deploy their own solution, which they can already do.
A good enough in-current-model solution is probably fine for many
Is there a fundamental reason we serialize java classes into Flink
savepoints.
On Wed, Feb 21, 2024 at 9:51 AM Robert Bradshaw via dev
wrote:
> We could consider merging the gradle targets without renaming the
> classpaths as an intermediate step.
>
> Optimistically, perhaps there's a small
Yes, that's true. The technique I proposed will work for simple pipelines
in streaming (e.g. basic ETL), where the throttling threads are probably
all scheduled. For more complicated pipelines (or batch pipelines), we
might find that it overthrottles. Maybe a hybrid solution that uses state
would
We could consider merging the gradle targets without renaming the
classpaths as an intermediate step.
Optimistically, perhaps there's a small number of classes that we need
to preserve (e.g. SerializablePipelineOptions looks like it was
something specifically intended to be serialized; maybe that
I like the idea of pushing back to the source much better than
unboundedly buffering things in state. I was trying to think of how to
just slow things down and one problem is that while we can easily
control the number of keys, it's much harder to control (or even
detect) the number of parallel
Agreed, that event-time throttling doesn't make sense here. In theory
processing-time timers have no SLA - i.e. their firing might be delayed -
so batch runners aren't violating the model by firing them all at the end;
however it does make processing time timers less useful in batch, as we see
On 2/21/24 17:52, Robert Bradshaw via dev wrote:
On Wed, Feb 21, 2024 at 12:48 AM Jan Lukavský wrote:
Hi,
I have left a note regarding the proposed splitting of batch and
streaming expansion of this transform. In general, a need for such split
triggers doubts in me. This signals that either
On Wed, Feb 21, 2024 at 12:48 AM Jan Lukavský wrote:
>
> Hi,
>
> I have left a note regarding the proposed splitting of batch and
> streaming expansion of this transform. In general, a need for such split
> triggers doubts in me. This signals that either
>
> a) the transform does something is
Hi,
while implementing FlinkRunner for Flink 1.17 I tried to verify that a
running Pipeline is able to successfully upgrade from Flink 1.16 to
Flink 1.17. There is some change regarding serialization needed for
Flink 1.17, so this was a concern. Unfortunately recently we merged
This is your daily summary of Beam's current high priority issues that may need
attention.
See https://beam.apache.org/contribute/issue-priorities for the meaning and
expectations around issue priorities.
Unassigned P1 Issues:
https://github.com/apache/beam/issues/30353 [Failing Test]:
Hi,
I have left a note regarding the proposed splitting of batch and
streaming expansion of this transform. In general, a need for such split
triggers doubts in me. This signals that either
a) the transform does something is should not, or
b) Beam model is not complete in terms of being
16 matches
Mail list logo