Re: [DISCUSSION] Docker based development environment issue

2021-06-11 Thread Brian Hulette
FYI Fernando recently contributed a setup script for configuring a local development environment [1] that is continuously verified on mac and ubuntu [2]. [1] https://github.com/apache/beam/pull/14584 [2] https://github.com/apache/beam/actions/workflows/local_env_tests.yml On Wed, May 26, 2021 at

Re: [EXT] Re: Removing deprecated oauth2client dependency for Python SDK

2021-06-11 Thread Chuck Yang
Thanks folks, I did take a look at the change and swapping to the google-cloud-* libraries looks like a major refactor. I did get something working by adding a shim so that the existing vendored libraries can work with the google-auth credential objects, similar to Luke's second point. Is this

Flaky test issue report (37)

2021-06-11 Thread Beam Jira Bot
This is your daily summary of Beam's current flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20labels%20%3D%20flake) These are P1 issues because they have a major negative impact on the community and make it hard to

P1 issues report (44)

2021-06-11 Thread Beam Jira Bot
This is your daily summary of Beam's current P1 issues, not including flaky tests (https://issues.apache.org/jira/issues/?jql=project%20%3D%20BEAM%20AND%20statusCategory%20!%3D%20Done%20AND%20priority%20%3D%20P1%20AND%20(labels%20is%20EMPTY%20OR%20labels%20!%3D%20flake). See

Re: Removing deprecated oauth2client dependency for Python SDK

2021-06-11 Thread Valentyn Tymofieiev
Thanks, Chuck for looking into this. We explored switching to google-cloud-* python libraries for Dataflow runner purposes, and encountered several issues related to dependency management of these libraries in Google-internal repository, which were difficult to address just in Beam plane without

Portable Python pipeline not splitting reads across executors

2021-06-11 Thread Ajo Thomas
Hi folks, I am working on running a Portable Python pipeline on Spark. The test pipeline is very straightforward where I am trying to read some avro data in hdfs using avroio (native io and not an external transform) and write it back to hdfs. Here is the pipeline: Pipeline: pipeline_options =