Thanks all for the comments. Based on the discussion so far, looks like we
have to flesh out the cross-language transforms feature quite a bit before
we can utilize some of the existing Java IO in other SDKs. This might
involve redesigning some of the existing Java IOs to allow expressing
second or
The numbers on that PR are not really what end-to-end means to me - it
normally means you have a fully represented productionized use case and the
metric you are looking at is the actual impact on the full system (like
latency from a tap on mobile to a dashboard being updated, or monthly
compute co
On Mon, Apr 30, 2018 at 9:54 AM Kenneth Knowles wrote:
> I agree with Cham's motivations as far as "we need it now" and getting
> Python SDF up and running and exercised on a real connector.
>
> But I do find the current API of BigQueryIO to be a poor example. That
> particular functionality on B
I think we've discussed this before... It is true that all of our
second-order APIs can be re-expressed as first-order APIs, but that would
come at a very serious performance cost - e.g. significant increase in
amount of data shuffled / materialized. The second-order APIs (most
importantly, Dynamic
I believe that most (all?) of these cases of executing a lambda could be
avoided if we passed along structured records like:
{
table_name:
row: { ... }
}
On Mon, Apr 30, 2018 at 10:24 AM Chamikara Jayalath
wrote:
>
>
> On Mon, Apr 30, 2018 at 9:54 AM Kenneth Knowles wrote:
>
>> I agree wit
On Mon, Apr 30, 2018 at 9:54 AM Kenneth Knowles wrote:
> I agree with Cham's motivations as far as "we need it now" and getting
> Python SDF up and running and exercised on a real connector.
>
> But I do find the current API of BigQueryIO to be a poor example. That
> particular functionality on B
Although I suspect/hope that sharing IO connectors across SDKs will
adequately cover the lion's share of implementations (especially the long
tail), I also think it's a case-by-case decision to make. Native IO might
be preferable for some uses and each SDK will want IO implementations where
they sh
I agree with Cham's motivations as far as "we need it now" and getting
Python SDF up and running and exercised on a real connector.
But I do find the current API of BigQueryIO to be a poor example. That
particular functionality on BigQueryIO seems extraneous and goes against
our own style guide [1
On Mon, Apr 30, 2018 at 8:05 AM Chamikara Jayalath
wrote:
> Hi Aljoscha,
>
> I tried to cover this in the doc. Once we have full support for
> cross-language IO, we can decide this on a case-by-case basis. But I don't
> think we should cease defining new sources/sinks for Beam Python SDK till
> w
Another point: cross-language IOs might add a performance penalty in many
cases. For an example of this look at BigQueryIO. The user can register a
SerializableFunction that is evaluated on every record, and determines
which destination to write the record to. Now a Python user would want to
regist
Hi Aljoscha,
I tried to cover this in the doc. Once we have full support for
cross-language IO, we can decide this on a case-by-case basis. But I don't
think we should cease defining new sources/sinks for Beam Python SDK till
we get to that point. I think there are good reasons for adding Kafka
su
Is this what we want to do in the long run, i.e. implement copies of connectors
for different SDKs? I thought the plan was to enable using connectors written
in different languages, i.e. use the Java Kafka I/O from python. This way we
wouldn't duplicate bugs for three different language (Java, P
Thanks Cham, this is great! I left just a couple of comments on the doc.
On Fri, Apr 27, 2018 at 10:06 PM Chamikara Jayalath
wrote:
> Hi All,
>
> I'm looking into adding a Kafka connector to Beam Python SDK. I think this
> will benefits many Python SDK users and will serve as a good example for
Hi All,
I'm looking into adding a Kafka connector to Beam Python SDK. I think this
will benefits many Python SDK users and will serve as a good example for
recently added Splittable DoFn API (Fn API support which will allow all
runners to use Python Splittable DoFn is in active development). I cr
14 matches
Mail list logo