Re: Executing the runner validation tests for the Twister2 runner

2020-01-06 Thread Pulasthi Supun Wickramasinghe
Hi Kenn, On Mon, Jan 6, 2020 at 9:09 PM Kenneth Knowles wrote: > > > On Mon, Jan 6, 2020 at 8:30 AM Pulasthi Supun Wickramasinghe < > pulasthi...@gmail.com> wrote: > >> Hi Kenn, >> >> I was able to solve the problem mentioned above, I am currently running >> the "ValidatesRunner" tests, I

Re: Jenkins jobs not running for my PR 10438

2020-01-06 Thread Kai Jiang
According to this comment , it might be a Jenkins bug. Meanwhile, I opened an infra ticket at

Re: Executing the runner validation tests for the Twister2 runner

2020-01-06 Thread Kenneth Knowles
On Mon, Jan 6, 2020 at 8:30 AM Pulasthi Supun Wickramasinghe < pulasthi...@gmail.com> wrote: > Hi Kenn, > > I was able to solve the problem mentioned above, I am currently running > the "ValidatesRunner" tests, I have around 4-5 tests that are failing that > I should be able to fix in a couple of

Re: Python IO Connector

2020-01-06 Thread Luke Cwik
Eugene, the JdbcIO output should be updated to support Beam's schema format which would allow for "rows" to cross the language boundaries. If the connector is easy to write and maintain then it makes sense for native. Maybe the Python version will have an easier time to support splitting and

Re: Dropping late data in DirectRunner

2020-01-06 Thread Kenneth Knowles
This thread has a lot in it, so I am just top-posting. - Stateful DoFn is a windowed operation; state is per-window. When the window expires, any further inputs are dropped. - "Late" is not synonymous with out-of-order. It doesn't really have an independent meaning. - For a GBK/Combine

Re: RabbitMQ and CheckpointMark feasibility

2020-01-06 Thread Daniel Robert
Alright, a bit late but this took me a while. Thanks for all the input so far. I have rewritten much of the RabbitMq IO connector and have it ready to go in a draft pr: https://github.com/apache/beam/pull/10509 This should incorporate a lot of what's been discussed here, in terms of

Re: Python IO Connector

2020-01-06 Thread Robert Bradshaw
On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath wrote: > Regarding cross-language transforms, we need to add better documentation, > but for now you'll have to go with existing examples and tests. For example, > > >

Re: Python IO Connector

2020-01-06 Thread Chamikara Jayalath
Regarding cross-language transforms, we need to add better documentation, but for now you'll have to go with existing examples and tests. For example, https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py

[RESULT][VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Mikhail Gryzykhin
Hi all, I'm happy to announce that we have approved this release. There are 5 approving votes, 4 of which are binding (in order): * Ahmet (al...@google.com); * Luke (lc...@google.com); * Reuven (re...@google.com); * Robert (rober...@google.com); There are no disapproving votes. Thanks

Re: PTransform serialization question

2020-01-06 Thread Alexey Romanenko
Thank you for clarification, Luke. > On 6 Jan 2020, at 20:03, Luke Cwik wrote: > > Anything that is reachable by the DoFn/CombineFn/*Fn needs to be > serializable. [1] is saying that it is common to have an anonymous inner > class for a DoFn which because of its serialization capture will get

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Mikhail Gryzykhin
Hi all, I'm happy to announce that we have approved this release. There are 5 approving votes, 4 of which are binding (in order): * Ahmet (al...@google.com); * Luke (lc...@google.com); * Reuven (re...@google.com); * Robert (rober...@google.com); There are no disapproving votes. Thanks

Re: Request for review of PR [Beam-8564]

2020-01-06 Thread Luke Cwik
Have you had a chance to update the PR? On Mon, Dec 30, 2019 at 5:00 AM Amogh Tiwari wrote: > Hi Luke, > > We have gone through shevek/lzo-java, but we chose to go with > airflow/aircompressor for the following reasons: > > 1) shevek/lzo-java is internally using .jni, .c and .h files, hence the

Re: PTransform serialization question

2020-01-06 Thread Luke Cwik
Anything that is reachable by the DoFn/CombineFn/*Fn needs to be serializable. [1] is saying that it is common to have an anonymous inner class for a DoFn which because of its serialization capture will get the encompassing class which is typically a PTransform. If you are careful about

Re: Python IO Connector

2020-01-06 Thread Luke Cwik
+Chamikara Jayalath +Heejong Lee On Mon, Jan 6, 2020 at 10:20 AM wrote: > How do I go about doing that? From the docs, it appears cross language > transforms are > currently undocumented. > https://beam.apache.org/roadmap/connectors-multi-sdk/ > On Jan 6, 2020, at 12:55 PM, Luke Cwik wrote:

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Robert Bradshaw
Thanks. That's the right one. The signatures (and everything else) all look good now. Changing my vote to a +1. On Mon, Jan 6, 2020 at 9:13 AM Mikhail Gryzykhin wrote: > KEYS files should be fixed now. > > On Mon, Jan 6, 2020 at 8:29 AM Robert Bradshaw > wrote: > >> Yes, please update KEYS to

Re: [Proposal] Slowly Changing Dimensions support in Beam

2020-01-06 Thread Mikhail Gryzykhin
I've narrowed down the topic. This does not include any of Dataflow part and is general for all runners. Please visit . Changes: * Changed title * Narrowed topic to slowly changing dimensions support only. This

Re: Python IO Connector

2020-01-06 Thread pbd281
How do I go about doing that? From the docs, it appears cross language transforms are currently undocumented. https://beam.apache.org/roadmap/connectors-multi-sdk/ > On Jan 6, 2020, at 12:55 PM, Luke Cwik wrote: > > What about using a cross language transform between Python and the already >

PTransform serialization question

2020-01-06 Thread Alexey Romanenko
Hello all, I found myself that I’m a bit confused with Serialization requirements for Beam transforms and I want to precise something. Here [1] it's clearly mentioned that “DoFn, PTransform, CombineFn and other instances will be serialized”. Since the most of Beam IO Read/Write transforms is

Re: Python IO Connector

2020-01-06 Thread Luke Cwik
What about using a cross language transform between Python and the already existing Java JdbcIO transform? On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann wrote: > I’d like to develop the Python SDK’s SQL IO connector. I was thinking it > would be easiest to use sqlalchemy to achieve maximum

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Mikhail Gryzykhin
KEYS files should be fixed now. On Mon, Jan 6, 2020 at 8:29 AM Robert Bradshaw wrote: > Yes, please update KEYS to have the correct key. (If you've never used the > other one you could just remove it.) > > On Mon, Jan 6, 2020, 6:46 AM Mikhail Gryzykhin wrote: > >> I see. Seems that the wrong

Re: Proposed Jira and PR to change error messaging for Python SDK filesystem module

2020-01-06 Thread Luke Cwik
Thanks for the contribution. On Mon, Jan 6, 2020 at 6:29 AM David Sabater Dinter wrote: > Hi everyone! > Happy New Year, personally coming with some resolutions like trying to > contribute more often to Apache projects I love. :) > Just wanted to mention that I created a new Jira >

Re: Flaky Java warning/error inventory (cannot find symbol)

2020-01-06 Thread Tomo Suzuki
Hi Alex, (I also feel frustrated to see sometimes Java precommit checks fail due to connection errors. I appreciate Beam project makes it easy to run it via "Run Java Precommit") I dug into the builds but no clear answer to your question. The "cannot find symbol" error [1] comes from

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Mikhail Gryzykhin
I see. Seems that the wrong key is imported into KEYS file. And header is incorrect. --Mikhail On Mon, Jan 6, 2020 at 6:16 AM Mikhail Gryzykhin wrote: > Hi Robert, > > I redownloaded binaries from > https://dist.apache.org/repos/dist/dev/beam/2.17.0/ and ran > > gpg --verify

Proposed Jira and PR to change error messaging for Python SDK filesystem module

2020-01-06 Thread David Sabater Dinter
Hi everyone! Happy New Year, personally coming with some resolutions like trying to contribute more often to Apache projects I love. :) Just wanted to mention that I created a new Jira and PR to improve

Re: [VOTE] Release 2.17.0, release candidate #2

2020-01-06 Thread Mikhail Gryzykhin
Hi Robert, I redownloaded binaries from https://dist.apache.org/repos/dist/dev/beam/2.17.0/ and ran gpg --verify apache-beam-2.17.0-source-release.zip.asc gpg: assuming signed data in 'apache-beam-2.17.0-source-release.zip' gpg: Signature made Mon 16 Dec 2019 09:17:23 PM PST gpg:

Beam Dependency Check Report (2020-01-06)

2020-01-06 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue cachetools 3.1.1 4.0.0 2019-12-23

Re: Dropping late data in DirectRunner

2020-01-06 Thread Jan Lukavský
> Generally the watermark update can overtake elements, because runners  can explicitly ignore late data in the watermark calculation (for good reason - those elements are already late, so no need to hold up the watermark advancing any more). This seems not to affect the decision of _not