Re: Question: Java Apache Beam, mock external Clients initialized in Setup

2024-05-30 Thread Steven van Rossum via dev
A recurring pattern, although not uniformly named, for I/O connectors is to provide a method on the PTransform named something along the lines of withClient, withClientFactory, withClientBuilderFactory, withClientProvider, etc. which stores the configuration the client is to be instantiated with

Re: Proposal to reduce the steps to make a Java transform portable

2024-03-07 Thread Steven van Rossum via dev
> > Is it actually necessary for a PTransform that is configured via the > Schema mechanism to also be one that uses RowCoder? Those strike me as two > separate concerns and unnecessarily limiting > +1 to this, I have a wip branch somewhere with a coder for

Re: Row compatible generated coders for custom classes

2023-12-03 Thread Steven van Rossum via dev
> Out of curiosity, did you add a warmup time before benchmarking? Schema and > row coder does codegen, so the first usage is very slow, but subsequent > usages should be much faster. I recommend running any test for a warmup > period before starting to measure. Yep, I poked at this using JMH

Re: Row compatible generated coders for custom classes

2023-12-01 Thread Steven van Rossum via dev
> Meaning BSON I presume? What do you mean by "tuple representation"? > (One downside of JSON is that the field names are redundantly stored > in each record, so even if you save on CPU it may hurt on the network > due to the greater data sizes). Yes, I meant BSON. Tuple or array representation

Row compatible generated coders for custom classes

2023-12-01 Thread Steven van Rossum via dev
Hi all, I was benchmarking the fastjson2 serialization library a few weeks back for a Java pipeline I was working on and was asked by a colleague to benchmark binary JSON serialization against Rows for fun. We didn't do any extensive analysis across different shapes and sizes, but the finding on

Rust SDK design docs/notes

2023-06-20 Thread Steven van Rossum via dev
Hi all, Work continues on a Rust SDK at https://github.com/laysakura/beam and design docs/notes are being collected at https://github.com/laysakura/beam/wiki/Design-docs if anyone wants to leave a comment or get engaged in design. It's a bit bare bones right now, but we've got a bunch more topics

Re: Hacking a Rust SDK

2023-01-07 Thread Steven van Rossum via dev
I just pulled a copy of your repo to integrate some of the work I did to flesh out a Rust worker harness, I'll have a PR ready soon-ish. Sorry I didn't spot your work before, otherwise I'd have gotten in touch and done that sooner. On Sat, Jan 7, 2023, 15:58 Nivaldo Tokuda wrote: > To anyone

Re: A Declarative API for Apache Beam

2022-12-15 Thread Steven van Rossum via dev
This is great! I developed a similar template a year or two ago as a reference for a customer to speed up their development process and unsurprisingly it did speed up their development. Here's an example of the config layout I came up with at the time: options: runner: DirectRunner pipeline: #