A recurring pattern, although not uniformly named, for I/O connectors is to
provide a method on the PTransform named something along the lines of
withClient, withClientFactory, withClientBuilderFactory,
withClientProvider, etc. which stores the configuration the client is to be
instantiated with
>
> Is it actually necessary for a PTransform that is configured via the
> Schema mechanism to also be one that uses RowCoder? Those strike me as two
> separate concerns and unnecessarily limiting
>
+1 to this, I have a wip branch somewhere with a coder for
> Out of curiosity, did you add a warmup time before benchmarking? Schema and
> row coder does codegen, so the first usage is very slow, but subsequent
> usages should be much faster. I recommend running any test for a warmup
> period before starting to measure.
Yep, I poked at this using JMH
> Meaning BSON I presume? What do you mean by "tuple representation"?
> (One downside of JSON is that the field names are redundantly stored
> in each record, so even if you save on CPU it may hurt on the network
> due to the greater data sizes).
Yes, I meant BSON. Tuple or array representation
Hi all,
I was benchmarking the fastjson2 serialization library a few weeks back for
a Java pipeline I was working on and was asked by a colleague to benchmark
binary JSON serialization against Rows for fun. We didn't do any extensive
analysis across different shapes and sizes, but the finding on
Hi all,
Work continues on a Rust SDK at https://github.com/laysakura/beam and
design docs/notes are being collected at
https://github.com/laysakura/beam/wiki/Design-docs if anyone wants to leave
a comment or get engaged in design.
It's a bit bare bones right now, but we've got a bunch more topics
I just pulled a copy of your repo to integrate some of the work I did to
flesh out a Rust worker harness, I'll have a PR ready soon-ish.
Sorry I didn't spot your work before, otherwise I'd have gotten in touch
and done that sooner.
On Sat, Jan 7, 2023, 15:58 Nivaldo Tokuda
wrote:
> To anyone
This is great! I developed a similar template a year or two ago as a
reference for a customer to speed up their development process and
unsurprisingly it did speed up their development.
Here's an example of the config layout I came up with at the time:
options:
runner: DirectRunner
pipeline:
#