Re: [ANNOUNCE] New Committer: Svetak Sundhar

2024-02-14 Thread John Casey via dev
Congrats Svetak! On Wed, Feb 14, 2024 at 9:00 AM Ahmed Abualsaud wrote: > Congrats Svetak! > > On 2024/02/14 02:05:02 Priyans Desai via dev wrote: > > Congratulations Svetak!! > > > > On Tue, Feb 13, 2024 at 8:09 PM Chamikara Jayalath via dev < > > dev@beam.apache.org> wrote: > > > > > Congrats

ByteBuddy DoFnInvokers Write Up

2024-01-10 Thread John Casey via dev
The team at Google recently held an internal hackathon, and my hack involved modifying how our ByteBuddy DoFnInvokers work. My hack didn't end up going anywhere, but I learned a lot about how our code generation works. It turns out we have no documentation or design docs about our code generation,

Re: Issue #21005

2023-12-15 Thread John Casey via dev
Hi Asmita, Those both make sense to me, feel free to go ahead. I'll be happy to review your PR when its ready On Thu, Dec 14, 2023 at 11:44 AM Asmita Mutgekar wrote: > Hi Team, > > I have picked Issue: Add documentation and improved errors for QueryFn in > MongoDbIO #21005 > Did some initial

Re: Upgrading Avro dependencies

2023-11-15 Thread John Casey via dev
.tool.Main” and generate Avro classes per every >>>> tested Avro version [3]. >>>> >>>> We still keep an old Avro version 1.8.2. as a default dependency >>>> version but it will be overwritten if users have a more recent one as a >>>>

Re: Upgrading Avro dependencies

2023-11-15 Thread John Casey via dev
ecent one as a project >>> dependency in their classpath. >>> >>> I think we need to completely remove Avro Gradle plugin (use “JavaExec” >>> task to generate Avro classes with a provided Avro version instead) and >>> update the default Avro versio

Re: Upgrading Avro dependencies

2023-11-14 Thread John Casey via dev
ion to the more recent one since now it’s not >> part of Java “core”. >> >> Any thoughts? >> >> — >> Alexey >> >> >> [1] https://github.com/apache/beam/issues/24292 >> [2] https://github.com/apache/beam/tree/master/sdks/java/ex

Re: Upgrading Avro dependencies

2023-11-10 Thread John Casey via dev
://github.com/apache/beam/tree/master/sdks/java/extensions/avro > [3] > https://github.com/apache/beam/blob/c713425e1ac2cdc3ec2ec264c9bf61f7356856bd/sdks/java/extensions/avro/build.gradle#L135 > > > > On 10 Nov 2023, at 18:05, John Casey via dev wrote: > > Hi All, > > There was a CVE detected

Upgrading Avro dependencies

2023-11-10 Thread John Casey via dev
Hi All, There was a CVE detected in Avro 1.8.2 (CVE-2023-39410), so I'm trying to upgrade to avro 1.11.3. Unfortunately, it seems that our auto-generated Avro test classes aren't being generated properly with this new version. I've updated our avro generation plugin as well, but for whatever

Adding Dead Letter Queues to Beam IOs

2023-11-08 Thread John Casey via dev
Hi All, I've written up a design for adding DLQs to existing Beam IOs. It's been through a round of reviews with some Dataflow folks at Google, but I'd appreciate any comments the rest of Beam have around how to refine the design. TL;DR: Make it easy for a user to configure IOs to route bad data

Re: [External Sender] Re: [Question] Error handling for IO Write Functions

2023-11-08 Thread John Casey via dev
to(writePath) > .withSuffix(".parquet")); > > log.info("Finished writing Parquet file to path {}", writePath); > } catch (Exception e) { > log.error("Error in Parquet Write Action. {}", e.getMessage()); >

Re: [Question] Error handling for IO Write Functions

2023-11-08 Thread John Casey via dev
There are 2 execution times when using Beam. The first execution is local, when a pipeline is constructed, and the second is remote on the runner, processing data. Based on what you said, it sounds like you are wrapping pipeline construction in a try-catch, and constructing FileIO isn't failing.

Re: Reshuffle PTransform Design Doc

2023-10-05 Thread John Casey via dev
Given that this is a hint, I'm not sure redistribute should be a PTransform as opposed to some other way to hint to a runner. I'm not sure of what the syntax of that would be, but a semantic no-op transform that the runner may or may not do anything with is odd. On Thu, Oct 5, 2023 at 11:30 AM

Re: [ANNOUNCE] New PMC Member: Robert Burke

2023-10-05 Thread John Casey via dev
Congrats! On Thu, Oct 5, 2023 at 4:07 AM Ismaël Mejía wrote: > Congratulations Robert, well deserved ! long live go ! > > On Wed, Oct 4, 2023 at 11:58 PM Chamikara Jayalath > wrote: > >> Congrats Rebo! >> >> On Wed, Oct 4, 2023 at 1:42 AM Jan Lukavský wrote: >> >>> Congrats Robert! >>> On

Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-06 Thread John Casey via dev
Agreed on documentation and on keeping it in a separate repo. We have a few pretty significant beam extensions (scio and Dataflow Templates also come to mind) that Beam should highlight, but are separate repos for their own governance, contributions, and release reasons. The difference with

Re: [ANNOUNCE] New committer: Ahmed Abualsaud

2023-08-25 Thread John Casey via dev
Congrats Ahmed! On Fri, Aug 25, 2023 at 10:43 AM Bjorn Pedersen via dev wrote: > Congrats Ahmed! Well deserved! > > On Fri, Aug 25, 2023 at 10:36 AM Yi Hu via dev > wrote: > >> Congrats Ahmed! >> >> On Fri, Aug 25, 2023 at 10:11 AM Ritesh Ghorse via dev < >> dev@beam.apache.org> wrote: >> >>>

Re: Beam IO Connector

2023-08-14 Thread John Casey via dev
I believe Devon Peticolas wrote a similar tool to create an IO that wrote to configurable sinks that might fit your use case On Sat, Aug 12, 2023 at 12:18 PM Bruno Volpato via dev wrote: > Hi Jeremy, > > Apparently you are trying to use Beam's DirectRunner >

How to Write a Beam IO

2023-06-13 Thread John Casey via dev
By request, I'm resharing my slides and doc on how to write a beam IO https://docs.google.com/document/d/1-WxZTNu9RrLhh5O7Dl5PbnKqz3e5gm1x3gDBBhszVF8/edit?usp=sharing https://docs.google.com/presentation/d/14PjBNFoCOFOROiQCdR3hkbg1fkDGSz-0Aer4L88P8Uk/edit?usp=sharing Please feel free to

Dataloss Bug in BigQuery IO Storage Write when used in Batch

2023-05-03 Thread John Casey via dev
Hi All, Per https://github.com/apache/beam/issues/26521 and https://github.com/apache/beam/issues/26520, there is an issue in Beam versions 2.33 - 2.47 where data can be lost when using the Storage Write API in Batch. This issue is much more likely to occur in versions 2.44-2.47. The bugs

Re: [Proposal] Automate Release Signing

2023-05-03 Thread John Casey via dev
+1 to this as well. On Wed, May 3, 2023 at 3:10 PM Robert Burke wrote: > +1 to simplifying release processes, since it leads to a more consistent > experience. > > If we continue to reduce release overhead we'll be able to react with more > agility when CVEs come a knocking. > > On Wed, May 3,

Re: [ANNOUNCE] New committer: Damon Douglas

2023-04-25 Thread John Casey via dev
Congrats Damon! On Tue, Apr 25, 2023 at 9:36 AM Yi Hu via dev wrote: > Congrats Damon! > > On Tue, Apr 25, 2023 at 8:55 AM Ritesh Ghorse via dev > wrote: > >> Congratulations Damon! >> >> On Tue, Apr 25, 2023 at 12:03 AM Byron Ellis via dev >> wrote: >> >>> Congrats Damon! >>> >>> On Mon, Apr

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-16 Thread John Casey via dev
Thanks Luke On Thu, Feb 16, 2023 at 12:06 PM Luke Cwik wrote: > All the PMC finalization tasks have been completed. > > On Thu, Feb 16, 2023 at 8:56 AM Luke Cwik wrote: > >> I'll help out. >> >> On Thu, Feb 16, 2023 at 7:08 AM John Casey via dev >> wr

Re: [ANNOUNCE] New PMC Member: Jan Lukavský

2023-02-16 Thread John Casey via dev
Thanks Jan! On Thu, Feb 16, 2023 at 11:11 AM Danny McCormick via dev < dev@beam.apache.org> wrote: > Congratulations! > > On Thu, Feb 16, 2023 at 11:09 AM Reza Rokni via dev > wrote: > >> Congratulations! >> >> On Thu, Feb 16, 2023 at 7:47 AM Robert Burke wrote: >> >>> Congratulations! >>>

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-16 Thread John Casey via dev
>>> >> >>> >> >>> >> Thanks! >>> >> >>> >> On Mon, Feb 13, 2023 at 1:13 AM Chamikara Jayalath via dev < >>> dev@beam.apache.org> wrote: >>> >>> >>> >>> +1 (binding) >>> >

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-15 Thread John Casey via dev
< >> dev@beam.apache.org> wrote: >> >>> >> >>> +1 (binding) >> >>> >> >>> Tried several Java and Python multi-language pipelines. >> >>> >> >>> Thanks, >> >>> Cham >> >>>

Re: [VOTE] Release 2.45.0, Release Candidate #1

2023-02-10 Thread John Casey via dev
Addendum to above email. Java artifacts were built with Gradle 7.5.1 and OpenJDK 1.8.0_362 On Fri, Feb 10, 2023 at 11:14 AM John Casey wrote: > Hi everyone, > Please review and vote on the release candidate #3 for the version 2.45.0, > as follows: > [ ] +1, Approve the release > [ ] -1, Do not

[VOTE] Release 2.45.0, Release Candidate #1

2023-02-10 Thread John Casey via dev
Hi everyone, Please review and vote on the release candidate #3 for the version 2.45.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) Reviewers are encouraged to test their own use cases with the release candidate, and vote +1 if no

Beam Release 2.45

2023-01-10 Thread John Casey via dev
Hi All, I propose we cut 2.45 on January 18, and I nominate myself as the release manager. This is a week delayed from the Jan 11 schedule, but this would give the 2.44 release time to finish its processes, allowing 2.45 to pick up any fixes in the 2.44 release. Thanks, John

How to write an IO guide draft

2023-01-09 Thread John Casey via dev
Hi All, I spent the last few weeks of December drafting a "How to write an IO guide": https://docs.google.com/document/d/1-WxZTNu9RrLhh5O7Dl5PbnKqz3e5gm1x3gDBBhszVF8/edit# and an associated code sample: https://github.com/apache/beam/pull/24799 My goal is to make it easier for a new IO

Re: SchemaTransformProvider | Java class naming convention

2022-11-15 Thread John Casey via dev
One distinction here is the difference between the URN for a provider / transform, and the class name in Java. We should have a standard for both, but they are distinct On Tue, Nov 15, 2022 at 3:39 PM Chamikara Jayalath via dev < dev@beam.apache.org> wrote: > > > On Tue, Nov 15, 2022 at 11:50

Re: [ANNOUNCE] New committer: Yi Hu

2022-11-09 Thread John Casey via dev
Congrats! this is well deserved YI On Wed, Nov 9, 2022 at 2:58 PM Austin Bennett wrote: > Congrats, and Thanks, Yi! > > On Wed, Nov 9, 2022 at 11:24 AM Valentyn Tymofieiev via dev < > dev@beam.apache.org> wrote: > >> I am with the Beam PMC on this, congratulations and very well deserved, >> Yi!

Re: [ANNOUNCE] New committer: Ritesh Ghorse

2022-11-04 Thread John Casey via dev
Congrats! On Fri, Nov 4, 2022 at 10:36 AM Ahmed Abualsaud via dev wrote: > Congrats Ritesh! > > On Fri, Nov 4, 2022 at 10:29 AM Andy Ye via dev > wrote: > >> Congrats Ritesh! >> >> On Fri, Nov 4, 2022 at 9:26 AM Kerry Donny-Clark via dev < >> dev@beam.apache.org> wrote: >> >>> Congratulations

Re: Rest connector

2022-09-12 Thread John Casey via dev
We don't have a generic REST API IO at the moment. There has been some discussion, but as far as I know there haven't been any IOs developed. A factor here is that most http based services aren't strictly REST, which makes the development of an easy to use, but sufficiently generic, IO

Re: SingleStore IO design doc

2022-09-07 Thread John Casey via dev
Hi Adalbert, This looks good. I've added a few comments to consider. John On Wed, Sep 7, 2022 at 5:33 AM Adalbert Makarovych < amakarovych...@singlestore.com> wrote: > Hi, > > I'm working on the SingleStore IO connector. > Here is a GitHub task for

Re: SingleStore IO

2022-08-25 Thread John Casey via dev
Hi Adalbert, The nature of scheduling work with splittable DoFns is such that trying to start all splits at the same time isn't really supported. In addition, the general assumption of splitting work in Beam is that a split can be retried in isolation from other splits, which doesn't look

Re: KafkaIO.java.configuredKafkaCommit() inconsistency

2022-08-09 Thread John Casey via dev
Thanks for the quick turnaround on this On Mon, Aug 8, 2022 at 9:34 PM Balázs Németh wrote: > thanks, see https://github.com/apache/beam/issues/22631 + > https://github.com/apache/beam/pull/22633 > > John Casey via dev ezt írta (időpont: 2022. aug. > 8., H, 21:30): > >&g

Re: KafkaIO.java.configuredKafkaCommit() inconsistency

2022-08-08 Thread John Casey via dev
Which looking at your message again, would imply that the configuredKafkaCommit() method shouldn't inspect isolation.level On Mon, Aug 8, 2022 at 3:27 PM John Casey wrote: > .withReadCommitted() doesn't commit messages when read, it instead > specifies that the kafka consumer should only read

Re: KafkaIO.java.configuredKafkaCommit() inconsistency

2022-08-08 Thread John Casey via dev
.withReadCommitted() doesn't commit messages when read, it instead specifies that the kafka consumer should only read messages that have themselves been committed to kafka. Its use is for exactly once applications. On Mon, Aug 8, 2022 at 3:16 PM Balázs Németh wrote: > I have been reading

Re: [ANNOUNCE] New committer: John Casey

2022-08-01 Thread John Casey via dev
Thanks! I'm looking forward to continuing to improve all of our connectors. On Mon, Aug 1, 2022 at 1:52 PM Yichi Zhang via dev wrote: > Congratulations John! > > On Sat, Jul 30, 2022 at 4:23 AM Robert Burke wrote: > >> Woohoo! Congrats John and welcome to committership! >> >> On Fri, Jul 29,

Re: Checkpoints timing out upgrading from Beam version 2.29 with Flink 1.12 to Beam 2.38 and Flink 1.14

2022-07-27 Thread John Casey via dev
Would it be possible to recreate the experiments to try and isolate variables? Right now the 3 cases change both beam and flink versions. On Tue, Jul 26, 2022 at 11:35 PM Kenneth Knowles wrote: > Bumping this and adding +John Casey who knows > about KafkaIO and unbounded sources, though

Disabling Kafka IO SDF implementation

2022-07-15 Thread John Casey via dev
Hi All, There is an issue right now where Kafka IO's SDF implementation isn't resuming properly when the pipeline restarts https://github.com/apache/beam/issues/21730. In addition, there was an issue where Kafka SDF wasn't committing properly when 'commit in finalize' was specified, and I