Re: [VOTE] Release 2.17.0, release candidate #2

2019-12-18 Thread jincheng sun
Thanks for drive this release Mikhail ! I have found there is an incorrect release version for release notes in PR[1], also left a question in PR[2]. But I do not think it's the blocker of the release :) Best, Jincheng [1] https://github.com/apache/beam/pull/10401 [2] https://github.com/apache/

Re: [Proposal] Slowly Changing Dimensions and Distributed Map Side Inputs (in Dataflow)

2019-12-18 Thread Kenneth Knowles
I do think that the implementation concerns around larger side inputs are relevant to most runners. Ideally there would be no model change necessary. Triggers are harder and bring in consistency concerns, which are even more likely to be relevant to all runners. Kenn On Wed, Dec 18, 2019 at 11:23

Re: PostCommit_Py_VR_Dataflow timing out

2019-12-18 Thread Brian Hulette
Ah thanks for the jira link. There's some critical context there - this seems to be caused by a deadlock, so increasing the timeout won't make more tests finish/pass, it will just consume a jenkins slot for longer. On Wed, Dec 18, 2019 at 1:43 PM Udi Meiri wrote: > Yes, there are objections sinc

Re: PostCommit_Py_VR_Dataflow timing out

2019-12-18 Thread Udi Meiri
Yes, there are objections since this would take up a Jenkins slot for longer. An alternative would be to set timeouts on individual tests. Debugging options: run the gradle tasks locally, try to pinpoint the culprit PR https://issues.apache.org/jira/browse/BEAM-8877 On Wed, Dec 18, 2019 at 1:25

PostCommit_Py_VR_Dataflow timing out

2019-12-18 Thread Brian Hulette
It looks like beam_PostCommit_Py_VR_Dataflow has been timing out at 1h40m since Dec 4 [1]. Are there any objections to bumping up the timeout to alleviate this? Or any other thoughts on potential causes and/or solutions? Brian [1] https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/

Re: Apache beam Python Error runners-spark-job-server-2.19.0-SNAPSHOT.jar not found

2019-12-18 Thread Tomo Suzuki
I don't use spark-job server but the error says you need to build the JAR file by cd C:\apache_beam; ./gradlew runners:spark:job-server:shadowJar Did you try that? On Wed, Dec 18, 2019 at 3:08 PM Dhiren Pachchigar wrote: > > Hi Team, > > I am trying to submit beam job in local spark with bel

Apache beam Python Error runners-spark-job-server-2.19.0-SNAPSHOT.jar not found

2019-12-18 Thread Dhiren Pachchigar
Hi Team, I am trying to submit beam job in local spark with below command :- spark-submit --master spark://192.168.0.106:7077 sample.py --runner=SparkRunner Getting error :-- RuntimeError: C:\apache_beam\runners\spark\job-server\build\libs\beam-runners-spark-job-server-2.19.0-SNAPSHOT.jar not

Re: [VOTE] Release 2.17.0, release candidate #2

2019-12-18 Thread Ahmet Altay
I validated python quickstarts with python 2. Wheels file are missing but they work otherwise. Once the wheel files are added I will add my vote. On Wed, Dec 18, 2019 at 10:00 AM Luke Cwik wrote: > I verified the release and ran the quickstarts and found that release 2.16 > broke Apache Nemo run

Re: [Proposal] Slowly Changing Dimensions and Distributed Map Side Inputs (in Dataflow)

2019-12-18 Thread Luke Cwik
Most of the doc is about how to support distributed side inputs in Dataflow and doesn't really cover how the Beam model (accumulating, discarding, retraction) triggers impact what are the "contents" of a PCollection in time and how this proposal for a limited set of side input shapes can work to su

Re: Root logger configuration

2019-12-18 Thread Pablo Estrada
A fix, calling basicconfig in pipeline and pipelineoptions: https://github.com/apache/beam/pull/10396 On Tue, Dec 17, 2019 at 3:17 PM Robert Bradshaw wrote: > The generally expected behavior is that if you don't do anything, > logging goes to stderr. Logging to non-root loggers breaks this. > (A

Re: [VOTE] Release 2.17.0, release candidate #2

2019-12-18 Thread Luke Cwik
I verified the release and ran the quickstarts and found that release 2.16 broke Apache Nemo runner which is also an issue for 2.17.0 RC #2. It is caused by a backwards incompatible change in ParDo.MultiOutput where getSideInputs return value was changed from List to Map as part of https://github.c

Re: Need Help | SpannerIO

2019-12-18 Thread Pablo Estrada
Or perhaps you have a PCollection or something like that, and you want to use those strings to issue queries to Spanner? PCollection myStrings = p.apply(.) PCollection rows = myStrings.apply( SpannerIO.read() .withInstanceId(instanceId) .withDatabaseId(dbId) .withQ

Re: Need Help | SpannerIO

2019-12-18 Thread Luke Cwik
How do you want to use the previous data in the SpannerIO.read()? Are you trying to perform a join on a key between two PCollections? If so, please use CoGroupByKey[1]. Are you trying to merge two PCollection objects? If so, please use Flatten[2]. 1: https://beam.apache.org/documentation/programm

Need Help | SpannerIO

2019-12-18 Thread Ajit Soman
Hi, I am creating a pipeline . I want to execute Spanner query once I got data from its previous stage. In java docs, they have given reference for this code. PCollection rows = pipleline.apply( SpannerIO.read() .withInstanceId(instanceId) .withDatabaseId(dbId) .withQ