Re: [DISCUSS] Idempotent initialization of file systems

2023-05-16 Thread Moritz Mack
Thanks for your thoughts, Robert. On 15.05.23, 19:23, "Robert Bradshaw via dev" wrote: On Mon, May 15, 2023 at 8:38 AM Moritz Mack wrote: > > Hi all, > > I was just looking into an old issue again, SerializablePipelineOptions > calling FileSystems.setDefaultPipelineO

[DISCUSS] Idempotent initialization of file systems

2023-05-15 Thread Moritz Mack
Hi all, I was just looking into an old issue again, SerializablePipelineOptions calling FileSystems.setDefaultPipelineOptions on deserialization [1]. This applies to various runners including Flink and Spark, but not Dataflow as far as I know. Problem: Current initialization of FileSystems thr

Re: Jenkins Flakes

2023-04-11 Thread Moritz Mack
Tue, Apr 11, 2023, 8: 14 AM Moritz Mack wrote: Thanks so much The coverage issue is only with the Java builds in specific. Go abd Python have their coverage numbers codecov uploads done in GitHub Actions instead. On Tue, Apr 11, 2023, 8:14 AM Moritz Mack mailto:mm...@talend.com>> wrote

Re: Jenkins Flakes

2023-04-11 Thread Moritz Mack
Thanks so much for looking into this! I’m absolutely +1 for removing Jenkins related friction and the proposed changes sound legitimate. Also, considering the number of flaky tests in general [1], code coverage might not be the pressing issue. Should it be disabled everywhere in favor of more r

Re: [ANNOUNCE] New PMC Member: Jan Lukavský

2023-02-17 Thread Moritz Mack
Congrats, Jan! On 16.02.23, 23:28, "Luke Cwik via dev" wrote: Congrats, well deserved. On Thu, Feb 16, 2023 at 10: 32 AM Anand Inguva via dev wrote: Congratulations!! On Thu, Feb 16, 2023 at 12:  42 PM Chamikara Jayalath via dev wrote: Congrats Jan!On Congrats, well deserved. On Thu, Feb 16

[Announcement] Planned removal of Spark 2 runner support in 2.46.0

2023-01-23 Thread Moritz Mack
Dear All, The runner for Spark 2 was deprecated quite a while back in August 2022 with the release of Beam 2.41.0 [1]. We’re planning to move ahead with this and finally remove support for Spark 2 (beam-runners-spark) to only maintain support for Spark 3 (beam-runners-spark-3) going forward. N

Re: SqlTransform translation deficiencies

2022-12-15 Thread Moritz Mack
Hi all, Just pumping this up again. Would anyone familiar with Beam SQL be able to have a look at this potential bug in Beam SQL. That help would be much appreciated! Thanks so much, Moritz On 23.11.22, 09:58, "Moritz Mack" wrote: Hi all, Not sure who’s best to ping. I spend

Re: [Proposal] | Move FileIO and TextIO from :sdks:java:core to :sdks:java:io:file

2022-12-12 Thread Moritz Mack
Hi Damon, I fear the current release / versioning strategy of Beam doesn’t lend itself well for such breaking changes. Alexey and I have spent quite some time discussing how to proceed with the problematic Avro dependency in core (and respectively AvroIO, of course). Such changes essentially al

Re: [Proposal] Adopt a Beam I/O Standard

2022-12-12 Thread Moritz Mack
Thanks so much! Great to see this to be picked up again with some good progress. / Moritz On 11.12.22, 15:17, "Herman Mak via dev" wrote: Hello Everyone, *TLDR* Should we adopt a set of standards that Connector I/Os should adhere to? Attached is a first version of a Beam I/O Standards guideli

Re: Achievement unlocked: fully triaged

2022-12-09 Thread Moritz Mack
Great, I really like the new simplified flow! Thanks for that! On 08.12.22, 19:48, "Kenneth Knowles" wrote: Merged it. Please be on the lookout for bugs I have introduced, since they could result in issues slipping through the cracks. On Wed, Dec 7, 2022 at 3:  31 PM Kenneth Knowles wrote: OK

SqlTransform translation deficiencies

2022-11-23 Thread Moritz Mack
Hi all, Not sure who’s best to ping. I spend some time looking into the SqlTransform translation of one of the TPC-DS queries yesterday and noticed it’s generating an overly complex transform hierarchy. I’ve summarized my findings in [1]. It would be great to get some more experienced eyes on i

Re: bhulette stepping back (for now)

2022-11-11 Thread Moritz Mack
Also, thanks so much for all the great and through reviews! That was always much appreciated! All the best, Brian On 11.11.22, 23:23, "Ahmet Altay via dev" wrote: Thank you for everything Brian! On Fri, Nov 11, 2022 at 11: 27 AM Austin Bennett wrote: Thanks for everything you've done, @  Bhul

Re: [DISCUSS] Avro dependency update, design doc

2022-11-11 Thread Moritz Mack
Thanks a lot for the feedback so far! I can only second Alexey. It was painful to come to realize that the only feasible option seems to be copying a lot of code during the transition phase. For that reason, it will be critical to be disciplined about the removal of the to-be deprecated code in

Re: [ANNOUNCE] New committer: Ritesh Ghorse

2022-11-07 Thread Moritz Mack
Congrats, Ritesh 😊 On 05.11.22, 03:08, "Ahmet Altay via dev" wrote: Congratulations Ritesh! On Fri, Nov 4, 2022 at 12: 18 PM Ritesh Ghorse via dev wrote: Thanks everyone! I'm glad to be a part of this community and I look forward to making more contributions in whatever ways I Congratulation

Re: [Infrastructure] Periodically run Java microbenchmarks on Jenkins

2022-09-15 Thread Moritz Mack
ote: >> >> Good idea. I'm curious about our current benchmarks. Some of them run on >> clusters, but I think some of them are running locally and just being noisy. >> Perhaps this could improve that. (or if they are running on local >> Spark/Flink then maybe the

[Infrastructure] Periodically run Java microbenchmarks on Jenkins

2022-09-13 Thread Moritz Mack
Hi team, I’m looking for some help to setup infrastructure to periodically run Java microbenchmarks (JMH). Results of these runs will be added to our community metrics (InfluxDB) to help us track performance, see [1]. To prevent noisy runs this would require a dedicated Jenkins machine that run

Re: Cannot find beam in project list on jira when I create issue

2022-09-07 Thread Moritz Mack
Sorry for the confusion. Beam migrated to using Github issues just recently and the confluence docs haven’t been updated yet. Please create a new issue under https://github.com/apache/beam/issues and then reference it in your commit message using the issue id, e.g. git commit -am “Description of

Re: Re:Re: Unable to load class 'org.apache.beam.gradle.BeamModulePlugin'

2022-08-09 Thread Moritz Mack
Hi, Please use a git clone of the apache/beam repository [1] as mentioned in the instructions [2]: > git clone g...@github.com:apache/beam.git It looks like the source code archive you’ve downloaded doesn’t contain some necessary build sources such as this plugin. Regards, Moritz [1] https://

Re: Output after Pipeline replaceAll

2022-07-29 Thread Moritz Mack
simpler to just have a flag on your translator that translates Create.Values into something that looks unbounded at the Spark layer. Kenn On Thu, Jul 28, 2022 at 2:01 AM Moritz Mack mailto:mm...@talend.com>> wrote: Hi all, Wondering if somebody could help and shed some lights on

Output after Pipeline replaceAll

2022-07-28 Thread Moritz Mack
Hi all, Wondering if somebody could help and shed some lights on the behavior of Pipeline.replaceAll, particularly the outputs to expect after the replacement. I’m currently looking into supporting VR tests for SparkRunner in streaming mode [1]. Unfortunately, I didn’t succeed replacing (wrappin

Re: [ANNOUNCE] New committer: Steven Niemitz

2022-07-20 Thread Moritz Mack
Congrats, Steven! On 21.07.22, 05:25, "Evan Galpin" wrote: Congrats! Well deserved! On Wed, Jul 20, 2022 at 15:⁠​17 Chamikara Jayalath via dev wrote:⁠​ Congrats, Steve! On Wed, Jul 20, 2022, 9:⁠​16 AM Austin Bennett wrote:⁠​ Great! ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ZjQcmQRYFpfptBannerStart ZjQcmQRYFpfptB

Re: [RFC] Gather JMH performance metrics in Beam community-metrics

2022-07-12 Thread Moritz Mack
nal performance benchmarks! But what does JMH stand for? On Tue, Jul 12, 2022, 7:54 AM Moritz Mack wrote: Hi all, This is a very short proposal to start running JMH benchmarks periodically and ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside yo

[RFC] Gather JMH performance metrics in Beam community-metrics

2022-07-12 Thread Moritz Mack
Hi all, This is a very short proposal to start running JMH benchmarks periodically and store benchmark results so we can start monitoring performance trends on the community metrics dashboards over time. Comments most welcome! https://s.apache.org/nvi9g Best regards, Moritz As a recipient of

Re: [PROPOSAL] Stop Spark2 support in Spark Runner

2022-06-29 Thread Moritz Mack
2.16.0 35 2.17.0 18 2.18.0 67 2.19.0 29 2.20.0 22 2.21.0 23 2.22.0 47 2.23.0 19 2.24.0 63 2.25.0 17 2.26.0 23 2.27.0 18 2.28.0 81 2.29.0 268 2.30.0 26 2.31.0 24 2.32.0 69 2.33.0 403 2.34.0 352 2.35.0 1543 2.36.0 50 2.37.0 19 2.38.0 420 2.39.0 86 All 5224 On Wed, Jun 29, 2022 at 1:24 AM Moritz Mack

Re: [PROPOSAL] Stop Spark2 support in Spark Runner

2022-06-29 Thread Moritz Mack
Who could help pulling the latest Maven download stats for beam-runners-spark and beam-runners-spark-3 for the last few Beam releases? Thanks so much! / Moritz On 01.04.22, 16:54, "Moritz Mack" wrote: I just started looking into the Spark runner code a bit to helpfully help sup

Re: Problems with Javadoc for 2.39.0

2022-06-01 Thread Moritz Mack
Hi Yichi, Sorry for breaking that :/ I had a quick look and would suggest to only generate Javadocs for the latest Spark runner version. Please see here: https://github.com/apache/beam/pull/17793 The process of aggregating Javadocs is a bit sketchy in cases where source sets are shared. Partic

Re: RDD (Spark dataframe) into a PCollection?

2022-05-24 Thread Moritz Mack
Hi Yushu, Have a look at org.apache.beam.runners.spark.translation.EvaluationContext in the Spark runner. It maintains that mapping between PCollections and RDDs (wrapped in the BoundedDataset helper). As Reuven just pointed out, values are timestamped (and windowed) in Beam, therefore BoundedD

SerializablePipelineOptions / FileSystems.setDefaultPipelineOptions

2022-05-17 Thread Moritz Mack
Does anybody here have some insights on this? Really wondering about the numbers, initializing all filesystems ~80k times for a pipeline run doesn’t seem right. On 13.05.22, 09:10, "Moritz Mack" wrote: Hi Jack, Silencing info logs for that class during IT tests would be a quick fix

Re: [DISCUSS] Next steps for update of Avro dependency in Beam

2022-05-17 Thread Moritz Mack
Sorry, please ignore the previous empty reply … On 17.05.22, 09:31, "Moritz Mack" wrote: On 16.05.22, 18:09, "Robert Bradshaw" wrote: On Mon, May 16, 2022 at 8:53 AM Alexey Romanenko wrote: > >> On 13 May 2022, at 18:38, Robert Bradshaw wrote: >&g

Re: [DISCUSS] Next steps for update of Avro dependency in Beam

2022-05-17 Thread Moritz Mack
On 17.05.22, 09:31, "Moritz Mack" wrote: On 16.05.22, 18:09, "Robert Bradshaw" wrote: On Mon, May 16, 2022 at 8:53 AM Alexey Romanenko wrote: > >> On 13 May 2022, at 18:38, Robert Bradshaw ZjQcmQRYFpfptBannerStar On 16.05.22, 18:09, "Robert Bradshaw"

Re: [DISCUSS] Next steps for update of Avro dependency in Beam

2022-05-17 Thread Moritz Mack
On 16.05.22, 18:09, "Robert Bradshaw" wrote: On Mon, May 16, 2022 at 8:53 AM Alexey Romanenko wrote: > >> On 13 May 2022, at 18:38, Robert Bradshaw wrote: >> >> We should probably remove the experimental annotations from > SchemaCoder at this point. > > Is there anything that stops us from th

Re: [DISCUSS] Next steps for update of Avro dependency in Beam

2022-05-13 Thread Moritz Mack
Thanks so much for all these pointers, Alexey. Having that context really helps! Skimming through the past conversations, this one key consideration hasn’t changed and seems still critical: AvroCoder is the de facto standard for encoding complex user types (with SchemaCoder still being experimen

Re: S3ClientBuilder Logging

2022-05-13 Thread Moritz Mack
Hi Jack, Silencing info logs for that class during IT tests would be a quick fix, but also removing logging there entirely shouldn’t hurt. If the S3 filesystem is used it’ll fail on first usage and the issue should be fairly obvious… Though wondering, this is logged once when file systems are i

Re: [DISCUSS] Deprecation of AWS SDK v1 IO connectors

2022-05-03 Thread Moritz Mack
should be the next major release but I’m not sure it’s even on distant horizon for now since this is topic that we didn’t discuss for a long time (maybe it’s a good time to come back to this). — Alexey On 18 Mar 2022, at 12:19, Moritz Mack mailto:mm...@talend.com>> wrote: Dear all,

Re: [DISCUSS] Deprecation of AWS SDK v1 IO connectors

2022-04-05 Thread Moritz Mack
n, W12 7TP From: Moritz Mack Sent: 21 March 2022 12:58 To: dev@beam.apache.org Subject: Re: [DISCUSS] Deprecation of AWS SDK v1 IO connectors Thank you both! Absolutely agree on reaching out to users! The release of 2.38 seems to be a very good time to do so t

Re: [PROPOSAL] Stop Spark2 support in Spark Runner

2022-04-01 Thread Moritz Mack
I just started looking into the Spark runner code a bit to helpfully help supporting it. Besides having to maintain (test!) twice the number of artifacts, there’s also a significant negative impact on developer ergonomics / productivity supporting multiple major versions (separate modules to dea

Re: [CODE QUESTION] Row getValues() vs getValue(int)

2022-03-29 Thread Moritz Mack
goes along with the attachValues method, which is similarly tricky to use. It's there to enable 0-copy code, but not necessarily intended for general consumption. On Tue, Mar 29, 2022 at 9:42 AM Moritz Mack mailto:mm...@talend.com>> wrote: Dear team, Is anybody around who could help me wi

[CODE QUESTION] Row getValues() vs getValue(int)

2022-03-29 Thread Moritz Mack
Dear team, Is anybody around who could help me with a question on Schemas / Rows? That would be much appreciated! I’m particularly looking at RowWithGetters currently and I’m stuck understanding the semantics of Row.getValues() [1]. public List getValues() { return getters.stream().map(g ->

Re: [DISCUSS] Deprecation of AWS SDK v1 IO connectors

2022-03-21 Thread Moritz Mack
ut I’m not sure it’s even on distant horizon for now since this is topic that we didn’t discuss for a long time (maybe it’s a good time to come back to this). — Alexey On 18 Mar 2022, at 12:19, Moritz Mack mailto:mm...@talend.com>> wrote: Dear all, I’d like to bring up an old discussion

[DISCUSS] Deprecation of AWS SDK v1 IO connectors

2022-03-18 Thread Moritz Mack
Dear all, I’d like to bring up an old discussion again [1]. Currently we have two different versions of AWS IO connectors in Beam for the Java SDK: * amazon-web-services [2] and kinesis [3] for the AWS Java SDK v1 * amazon-web-services2 (including kinesis) [4] for the AWS Java SDK v2 M

Re: NATS IO Connector

2022-03-14 Thread Moritz Mack
A NATS connector would be great, Suresh. Really enjoyed how easy to operate and reliable it is! Curious, are you using NATS with Jetstream enabled (the replacement of the legacy NATS streaming layer) or core NATS (at most once delivery)? Regards, Moritz From: Alexey Romanenko Date: Thursday, 1

Re: [ANNOUNCE] New committer: Moritz Mack

2022-03-14 Thread Moritz Mack
Thanks so much everyone 😊 From: Pablo Estrada Date: Friday, 11. March 2022 at 17:43 To: dev Subject: Re: [ANNOUNCE] New committer: Moritz Mack Congrats Moritz! Well deserved indeed:) On Fri, Mar 11, 2022, 6:30 AM Evan Galpin wrote: Congrats Moritz! On Fri, Mar 11, 2022 at 3:05 AM Etienne

Re: Beam jira permissions

2022-03-01 Thread Moritz Mack
Thanks so much Stephen and welcome to Beam! I’m more than happy to review your PR, just ping me once opened (R: @mosche). I’ve done a bit of work recently to get the AWS v2 module in a better / ready shape, your help there is much appreciated. /Moritz From: Ahmet Altay Date: Tuesday, 1. March 20

Re: KafkaIO.write and Avro

2022-02-08 Thread Moritz Mack
Just having a quick look, it looks like the respective interface in KafkaIO should rather look like this to support KafkaAvroSerializer, which is a Serializer: public Write withValueSerializer(Class> valueSerializer) Thoughts? Cheers, Moritz From: Moritz Mack Date: Tuesday, 8. Febru

Re: KafkaIO.write and Avro

2022-02-08 Thread Moritz Mack
Hi Matt, Unfortunately, the types don’t play well when using KafkaAvroSerializer. It currently requires a cast :/ The following will work: write.withValueSerializer((Class)KafkaAvroSerializer.class)) This seems to be the cause of repeated confusion, so probably worth improving the user experien

Re: Beam Contributer

2021-11-02 Thread Moritz Mack
Hi, Welcome 😊 I don’t have permissions to manage Jira, but you should have received an invite for Slack. Best, Moritz From: Mostafa Aghajani Reply to: "dev@beam.apache.org" Date: Tuesday, 2. November 2021 at 17:27 To: "dev@beam.apache.org" Subject: Beam Contributer Warning! External email.

Intro

2021-10-22 Thread Moritz Mack
Hi all, I’m very much looking forward to start contributing to Beam and just want to briefly introduce myself. My name is Moritz (mosche) and I’m working together with Alexey and Etienne. Having worked mostly with Spark in the past, I’m excited to dive deeper into Beam 😊 Looking forward to wo