Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-09 Thread Chamikara Jayalath via dev
I would say: sink: type: WriteToParquet config: path: /beam/filesytem/dest prefix: suffix: Underlying SDK will add the middle part of the file names to make sure that files generated by various bundles/windows/shards do not conflict. This will satisfy

Re: CoderProviderRegistrar class not found

2023-10-09 Thread L. C.
That was the only exception that I saw from running on the command line. The error is pretty easy to reproduce. All I did was generate the app from Maven template, then run it on baseline Dataproc 2.1 image: generate app: $ mvn archetype:generate -DarchetypeGroupId=org.apache.beam

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-09 Thread Robert Bradshaw via dev
.On Mon, Oct 9, 2023 at 1:49 PM Reuven Lax wrote: > Just FYI - the reason why names (including prefixes) in > DynamicDestinations were parameterized via a lambda instead of just having > the user add it via MapElements is performance. We discussed something > along the lines of what you are

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-09 Thread Robert Bradshaw via dev
.On Mon, Oct 9, 2023 at 1:11 PM Robert Burke wrote: > I'll note that the file "Writes" in the Go SDK are currently an unscalable > antipattern, because of this exact question. > > Aside from carefully examining other SDKs it's not clear how one authors > a reliable, automatically shardable,

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-09 Thread Reuven Lax via dev
Just FYI - the reason why names (including prefixes) in DynamicDestinations were parameterized via a lambda instead of just having the user add it via MapElements is performance. We discussed something along the lines of what you are suggesting (essentially having the user create a KV where the

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-09 Thread Kenneth Knowles
OK I can cherrypick it so they have an upgrade fix. But also we should instruct users to pin their fastavro version to a good version. That is probably safer and easier than upgrading Beam. Our containers that we build have the version pinned, right? So will this also cause all the prior

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-09 Thread Yi Hu via dev
Yes, and moreover, this specific issue will break the user the same way for *all* Beam versions (2.50.0, 2.49.0, etc) after Oct 3. That said the issue is not limited to Beam 2.50.0 though. On Mon, Oct 9, 2023 at 4:08 PM Kenneth Knowles wrote: > If we had closed the release today, this would

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-09 Thread Robert Burke
I'll note that the file "Writes" in the Go SDK are currently an unscalable antipattern, because of this exact question. Aside from carefully examining other SDKs it's not clear how one authors a reliable, automatically shardable, window and pane aware in an arbitrary SDK, simply by referring to

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-09 Thread Kenneth Knowles
If we had closed the release today, this would still have broken all our users, correct? Kenn On Mon, Oct 9, 2023 at 3:37 PM Anand Inguva via dev wrote: > There was a regression[1] on fastavro latest release 1.8.4. Fix was merged > at https://github.com/apache/beam/pull/28896. The RC1 includes

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-09 Thread Anand Inguva via dev
There was a regression[1] on fastavro latest release 1.8.4. Fix was merged at https://github.com/apache/beam/pull/28896. The RC1 includes that version in the range for fastavro[2]. I think we need to CP https://github.com/apache/beam/pull/28896 to solve the fastavro regression. [1]

[YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-09 Thread Robert Bradshaw via dev
Currently the various file writing configurations take a single parameter, path, which indicates where the (sharded) output should be placed. In other words, one can write something like pipeline: ... sink: type: WriteToParquet config: path: /beam/filesytem/dest and

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-09 Thread Kenneth Knowles
Ran a couple of Java pipelines "as a newb user" to make sure our instructions weren't out of date. There are some errors in the instructions but they don't have to do with this release. Re-ran mass_comment.py on https://github.com/apache/beam/pull/28663. There are enough red signals there that

Re: CoderProviderRegistrar class not found

2023-10-09 Thread Chamikara Jayalath via dev
On Thu, Oct 5, 2023 at 2:05 PM L. C. wrote: > I'm getting class not found error while running the word count example on > Dataproc 2.1 with Beam 2.50.0. The class exists under the jar. Does > anyone know how to resolve this? > > This is a list of dependency versions: > 2.50.0 > >

Beam High Priority Issue Report (43)

2023-10-09 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/28811 [Failing Test]: