Re: A Declarative API for Apache Beam

2022-12-14 Thread Chamikara Jayalath via dev
+1 for these proposals and agree that these will simplify and demystify Beam for many new users. I think when combined with the x-lang/Schema-Aware transform binding, these might end up being adequate solutions for many production use-cases as well (unless users need to define custom composites,

Re: A Declarative API for Apache Beam

2022-12-14 Thread Sachin Agarwal via dev
To build on Kenn's point, if we leverage existing stuff like dbt we get access to a ready made community which can help drive both adoption and incremental innovation by bringing more folks to Beam On Wed, Dec 14, 2022 at 2:57 PM Kenneth Knowles wrote: > 1. I love the idea. Back in the early

Re: A Declarative API for Apache Beam

2022-12-14 Thread Robert Burke
I like the idea of a common spec for something like this so we can actually cross validate all the SDK behaviours. It would make testing significantly easier. On Wed, Dec 14, 2022, 2:57 PM Kenneth Knowles wrote: > 1. I love the idea. Back in the early days people talked about an "XML > SDK" or

Re: A Declarative API for Apache Beam

2022-12-14 Thread Kenneth Knowles
1. I love the idea. Back in the early days people talked about an "XML SDK" or "JSON SDK" or "YAML SDK" and it didn't really make sense at the time. Portability and specifically cross-language schema transforms gives the right infrastructure so this is the perfect time: unique names (URNs) for

Re: A Declarative API for Apache Beam

2022-12-14 Thread Byron Ellis via dev
And I guess also a PR for completeness to make it easier to find going forward instead of my random repo: https://github.com/apache/beam/pull/24670 On Wed, Dec 14, 2022 at 2:37 PM Byron Ellis wrote: > Since Robert opened that can of worms (and we happened to talk about it > yesterday)... :-) >

Re: A Declarative API for Apache Beam

2022-12-14 Thread Byron Ellis via dev
Since Robert opened that can of worms (and we happened to talk about it yesterday)... :-) I figured I'd also share my start on a "port" of dbt to the Beam SDK. This would be complementary as it doesn't really provide a way of specifying a pipeline, more orchestrating and packaging a complex

Re: [PROPOSAL] Preparing for Apache Beam 2.44.0 Release

2022-12-14 Thread Kenneth Knowles
I've edited the subject for this update. There are no more open bugs targeting the release milestone. I will prepare RC1 shortly. Kenn On Thu, Dec 1, 2022 at 12:55 PM Kenneth Knowles wrote: > Just an update that the branch is cut. > > There are 8 issues targeted to the release milestone: >

Re: A Declarative API for Apache Beam

2022-12-14 Thread Damon Douglas via dev
Hello Robert, I'm replying to say that I've been waiting for something like this ever since I started learning Beam and I'm grateful you are pushing this forward. Best, Damon On Wed, Dec 14, 2022 at 2:05 PM Robert Bradshaw wrote: > While Beam provides powerful APIs for authoring

A Declarative API for Apache Beam

2022-12-14 Thread Robert Bradshaw via dev
While Beam provides powerful APIs for authoring sophisticated data processing pipelines, it often still has too high a barrier for getting started and authoring simple pipelines. Even setting up the environment, installing the dependencies, and setting up the project can be an overwhelming amount

DRAFT - Apache Beam Board Report - December 2022

2022-12-14 Thread Kenneth Knowles
Hi all, The next Beam board report is due this Friday, December 16. Please help me to draft it at https://s.apache.org/beam-draft-report-2022-12. I've opened edit access to anyone with the link to minimize friction of drafting. Ideas: - highlights from CHANGES.md - interesting technical

Re: [Proposal] | Move FileIO and TextIO from :sdks:java:core to :sdks:java:io:file

2022-12-14 Thread Ahmet Altay via dev
I agree with Sachin. Keeping components that users will have to bring together anyway leads to a better user experience. Counter example to that is GCP libraries in my opinion. It was a frequent struggle for users to find a working set of libraries until there was a BOM. And even after the BOM it

Re: [Proposal] | Move FileIO and TextIO from :sdks:java:core to :sdks:java:io:file

2022-12-14 Thread Sachin Agarwal via dev
I strongly believe that we should continue to have Beam optimize for the user - and while having separate components would allow those of us who are contributors and committers move faster, the downsides of not having everything "in one box" for a new user where the components are all relatively

[Question] Github Actions Migration - Error with tests that use cython in self hosted runners.

2022-12-14 Thread Andoni Guzman Becerra
Hi All! We are working on the effort to migrate tests from jenkins to github actions in self hosted runners, but we are facing an issue related with tests that use cython and tox. This only happens in our self hosted runners, not in another runner like "ubuntu-latest" from github. Our self hosted

Re: [Proposal] | Move FileIO and TextIO from :sdks:java:core to :sdks:java:io:file

2022-12-14 Thread Byron Ellis via dev
Talk it with a grain of salt since I'm not even a committer, but is perhaps the reorganization of Beam into smaller components the real work of a 3.0 effort? Splitting of Beam into smaller more independently managed components would be a pretty huge breaking change from a dependency management

Re: [Proposal] | Move FileIO and TextIO from :sdks:java:core to :sdks:java:io:file

2022-12-14 Thread Alexey Romanenko
On 12 Dec 2022, at 22:23, Robert Bradshaw via dev wrote: > > Saving up all the breaking changes until a major release definitely > has its downsides (look at Python 3). The migration path is often as > important (if not more so) than the final destination. Actually, it proves that the major

SingleStoreIO SchemaTransform

2022-12-14 Thread Adalbert Makarovych
Hi Can someone review this PR https://github.com/apache/beam/pull/24290 Thanks for your attention. -- Adalbert Makarovych Software Engineer at SingleStore

Beam High Priority Issue Report (30)

2022-12-14 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/24655 [Bug]: Pipeline

Re: @RequiresStableInput and Pipeline fusion

2022-12-14 Thread Jan Lukavský
Filled https://github.com/apache/beam/issues/24655.  Jan On 12/14/22 00:52, Luke Cwik via dev wrote: This is definitely not working for portable pipelines since the GreedyPipelineFuser doesn't create a fusion boundary which as you pointed out causes a single stage that has a non-deterministic