Re: [Call for items] September Beam Newsletter

2018-09-07 Thread Rose Nguyen
*bump* Celebrate the weekend by sharing with the community your talks, contributions, plans, etc! On Wed, Sep 5, 2018 at 10:25 AM Rose Nguyen wrote: > Hi Beamers: > > Here's > > [1] the template f

Re: SplittableDoFn

2018-09-07 Thread Lukasz Cwik
Thanks for everyone who wanted to fill out the doodle poll. The most popular time was Friday Sept 14th from 11am-noon PST. I'll send out a calendar invite and meeting link early next week. I have received a lot of feedback on the document and have addressed some parts of it including: * clarifying

Re: New Post Commit Task fails in SetupVirtualEnv when running on Jenkins

2018-09-07 Thread Ankur Goenka
It seems that the issue was with the length of file name which was /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_PortableValidatesRunner_Flink_Gradle_PR/src/sdks/python/build/gradleenv/bin/python2 Changing the task name to beam_PostCommit_Python_PVR_Flink_Gradle_PR Fixed the issue.

Re: [DISCUSS] Versioning, Hadoop related dependencies and enterprise users

2018-09-07 Thread Yifan Zou
Thanks all for comments and suggestions. We want to close this thread and start implementing the new policy based on the discussion: 1. Stop assigning JIRAs to the first person listed in the dependency owners files . Instead, cc people on the o

[VOTE] Release 2.7.0, release candidate #1

2018-09-07 Thread Charles Chen
Hi everyone, Please review and vote on the release candidate #1 for the version 2.7.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], *

Re: PR/6343: Adding support for MustFollow

2018-09-07 Thread Peter Li
Thanks! I (PR author) agree with all that. On the unbounded triggering issue, I can see 2 reasonable desired behaviors: 1) The collection to follow is bounded and the intent is to wait for the entire collection to be processed. 2) The collection to follow has windows that in some flexible sen

Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #164

2018-09-07 Thread Apache Jenkins Server
See Changes: [herohde] [BEAM-5327] Add support for custom dataflow worker jar in Go [dattran] [BEAM-5107] Add support for ES-6.x to ElasticsearchIO [mxm] Fixing bug in flink job server creatio

Re: Should we mention TF Transform in Beam site?

2018-09-07 Thread Matthias Feys
I also published a minimal boilerplate example on github for using tf.Transform with Apache Beam & ML Engine https://github.com/Fematich/tftransform-demo with an accompanying blogpost: https://cloud.google.com/blog/products/ai-machine-learning/pre-processing-tensorflow-pipelines-tftransform-google-

Beam Dependency Check Report (2018-09-03)

2018-09-07 Thread Apache Jenkins Server
High Priority Dependency Updates Of Beam Python SDK: Dependency Name Current Version Latest Version Release Date Of the Current Used Version Release Date Of The Latest Release JIRA Issue google-cloud-bigquery 0.25.0 1.5.0

Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #162

2018-09-07 Thread Apache Jenkins Server
See Changes: [pablo] Trying to add synthetic step [pablo] Apply spotless [pablo] Fixing findbugs issue [pablo] Fixing issues [pablo] Addressing comments from reviewer [pablo] Addressing com

Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #161

2018-09-07 Thread Apache Jenkins Server
See Changes: [rpathak] [BEAM-3820] Exposing batchSize for SolrIO Writes [rpathak] [BEAM-3820] Updating comments adding default batch size in javadoc [Jozef.Vilcek] Remove slf4j-simple binding

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-07 Thread David Morávek
+1 for option 3 as it should be the least painful option for the current users D. Sent from my iPhone > On 7 Sep 2018, at 19:50, Tim wrote: > > Another +1 for option 3 (and preference of HadoopFormatIO naming). > > Thanks Alexey, > > Tim > > >> On 7 Sep 2018, at 19:13, Andrew Pilloud wrot

Re: Failing: beam_PerformanceTests_Python

2018-09-07 Thread Mark Liu
PostCommit_Python_Verify do run this command which is configured in sdks/python/build.gradle and is a dependency of `:beam-sdks-python:postCommit` wh

PR/6343: Adding support for MustFollow

2018-09-07 Thread Lukasz Cwik
A contributor opened a PR[1] to add support for a PTransform that forces one PTransform to be executed before another by using side input readiness as a way to defer execution. They have provided this example usage: # Ensure that output dir is created before attempting to write output files. outpu

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-07 Thread Tim
Another +1 for option 3 (and preference of HadoopFormatIO naming). Thanks Alexey, Tim > On 7 Sep 2018, at 19:13, Andrew Pilloud wrote: > > +1 for option 3. That approach will keep the mapping clean if SQL supports > this IO. It would be good to put the proxy in the old module and move the >

PTransforms and Fusion

2018-09-07 Thread Lukasz Cwik
A primitive transform is a PTransform that has been chosen to have no default implementation in terms of other PTransforms. A primitive transform therefore must be implemented directly by a pipeline runner in terms of pipeline-runner-specific concepts. An initial list of primitive PTransforms were

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-07 Thread Andrew Pilloud
+1 for option 3. That approach will keep the mapping clean if SQL supports this IO. It would be good to put the proxy in the old module and move the implementation now. That way the old module can be easily deleted when the time comes. Andrew On Fri, Sep 7, 2018 at 6:15 AM Robert Bradshaw wrote:

Re: Failing: beam_PerformanceTests_Python

2018-09-07 Thread Łukasz Gajowy
> > > We can also think about moving performance tests to Gradle which seems > provide a stable way to setup python environment > > since recent beam_PostCommit_Python_Verify >

Re: Failing: beam_PerformanceTests_Python

2018-09-07 Thread Łukasz Gajowy
I think we should focus on fixing the BEAM-5334 issue first. I investigated it a little bit and tried to reproduce it locally on my machine - no success. The logs say: "05:18:37 error: [Errno 2] No such file or directory" but do not specify what i

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-07 Thread Robert Bradshaw
OK, good, that's what I thought. So I stick by (3) which 1) Cleans up the library for all future uses (hopefully the majority of all users :). 2) Is fully backwards compatible for existing users, minimizing disruption, and giving them time to migrate. On Fri, Sep 7, 2018 at 2:51 PM Alexey Romanen

Re: [NEW CONTRIBUTOR] ElasticsearchIO now supports Elasticsearch v6.x

2018-09-07 Thread Maximilian Michels
Well done. Thank you, Dat! On 06.09.18 22:47, Trần Thành Đạt wrote: Thank you. Etienne Chauchot and Tim Robertson helped me a lot to get familiar with Beam code. On Fri, Sep 7, 2018 at 2:59 AM Thomas Weise > wrote: Support for Elastic 6.x is really good to have. T

Re: Python 3: final step

2018-09-07 Thread Maximilian Michels
This has been requested multiple times. Thanks for working on the Python 3 story. Let me know if I can help out in any way! On 05.09.18 19:01, Valentyn Tymofieiev wrote: This is awesome! Kudos to Robbe and Matthias who have been pushing this forward! On Wed, Sep 5, 2018 at 9:45 AM Charles Ch

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-07 Thread Alexey Romanenko
In next release it will be still compatible because we keep module “hadoop-input-format” but we make it deprecated and propose to use it through module “hadoop-format” and proxy class HadoopFormatIO (or HadoopMapReduceFormatIO, whatever we name it) which will provide Write/Read functionality by

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-07 Thread Robert Bradshaw
Agree about not impacting users. Perhaps I misread (3), isn't it fully backwards compatible as well? On Fri, Sep 7, 2018 at 1:33 PM Jean-Baptiste Onofré wrote: > Hi, > > in order to limit the impact for the existing users on Beam 2.x series, > I would go for (1). > > Regards > JB > > On 06/09/20

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-07 Thread Jean-Baptiste Onofré
Hi, in order to limit the impact for the existing users on Beam 2.x series, I would go for (1). Regards JB On 06/09/2018 17:24, Alexey Romanenko wrote: > Hello everyone, > > I’d like to discuss the following topic (see below) with community since > the optimal solution is not clear for me. > >

Re: [DISCUSS] Unification of Hadoop related IO modules

2018-09-07 Thread Robert Bradshaw
I think it makes sense to keep *hadoop-file-system* separate, as it's common to use HDFS even if one is not using any of the other hadoop (mapreduce) libraries. On the other hand, it makes a lot of sense to me to put the hadoop read and write into the same module, probably going with option (3) whe