Re: Python SDK Arrow Integrations

2019-03-27 Thread Kenneth Knowles
Thinking about Arrow + Beam SQL + schemas: - Obviously many SQL operations could be usefully accelerated by arrow / columnar. Especially in the analytical realm this is the new normal. For ETL, perhaps less so. - Beam SQL planner (pipeline construction) is implemented in Java, and so the

Re: NullPointerException - Session windows with Lateness in FlinkRunner

2019-03-27 Thread rahul patwari
+dev On Wed 27 Mar, 2019, 9:47 PM rahul patwari, wrote: > Hi, > I am using Beam 2.11.0, Runner - beam-runners-flink-1.7, Flink Cluster - > 1.7.2. > > I have this flow in my pipeline: > KafkaSource(withCreateTime()) --> ApplyWindow(SessionWindow with > gapDuration=1 Minute, lateness=3 Minutes,

Re: Python SDK Arrow Integrations

2019-03-27 Thread Ahmet Altay
Thank you Brian, this looks promising. cc: +Chamikara Jayalath +Heejong Lee On Wed, Mar 27, 2019 at 1:22 PM Brian Hulette wrote: > Hi everyone, > I've been doing some investigations into how Arrow might fit into Beam as > a way to ramp up on the project. As I've gone about this I've

Re: [PROPOSAL] Standardize Gradle structure in Python SDK

2019-03-27 Thread Ahmet Altay
This sounds good to me. Thank you for doing this. Few questions: - Could you comment on what kind of parallelization we will gain by this? In terms of real numbers, how would this affect build and test times? - I am guessing this will reduce complexity. Is it possible to quantify the improvement

Re: Frequent failures on beam8

2019-03-27 Thread Mikhail Gryzykhin
And another one. beam14 OOMs On Mon, Mar 25, 2019 at 5:54 PM Yifan Zou wrote: > the beam8 is disabled by now. > > On Mon, Mar 25, 2019 at 2:06 PM Mikhail Gryzykhin > wrote: > >> Yifan is looking into this. >> >> On Mon, Mar 25, 2019 at 1:55 PM Boyuan Zhang wrote: >> >>> Hey all, >>> >>> Could

Re: Deprecating Avro for fastavro on Python 3

2019-03-27 Thread Chamikara Jayalath
+1 for making use_fastavro the default for Python3. I don't see any significant drawbacks in doing this from Beam's point of view. One concern is whether avro and fastavro can safely co-exist in the same environment so that Beam continues to work for users who already have avro library installed.

Re: Deprecating Avro for fastavro on Python 3

2019-03-27 Thread Valentyn Tymofieiev
Thanks, Robbe and Frederik, for raising this. Over the course of making Beam Python 3 compatible this is at least the second time [1] we have to deal with an error in avro-python3 package. The release cadence of Apache Avro (1 release a year) is concerning to me [2]. Even if we have a new release

Re: New contributor

2019-03-27 Thread Kenneth Knowles
Welcome! On Wed, Mar 27, 2019 at 2:59 PM Mikhail Gryzykhin wrote: > Welcome Niklas. > > This is another location with useful resources for contributors: > https://cwiki.apache.org/confluence/display/BEAM/Developer+Guides (contributor > guide has link to this as well though) > > On Wed, Mar 27,

Re: New contributor

2019-03-27 Thread Mikhail Gryzykhin
Welcome Niklas. This is another location with useful resources for contributors: https://cwiki.apache.org/confluence/display/BEAM/Developer+Guides (contributor guide has link to this as well though) On Wed, Mar 27, 2019 at 10:54 AM Connell O'Callaghan wrote: > Welcome Niklas - given your

Debugging :beam-sdks-java-io-hadoop-input-format:test

2019-03-27 Thread Mikhail Gryzykhin
Hi everyone, I have a pre-commit job that fails on *:beam-sdks-java-io-hadoop-input-format:test* . Relevant PR. Target doesn't have any explicit log associated with it. Running

Python SDK Arrow Integrations

2019-03-27 Thread Brian Hulette
Hi everyone, I've been doing some investigations into how Arrow might fit into Beam as a way to ramp up on the project. As I've gone about this I've prototyped a couple of additions to the Python SDK. I think these additions may be useful for others so I'm considering cleaning them up and

Re: Build blocking on

2019-03-27 Thread Robert Burke
Again very valid concerns! I wouldn't take that step lightly (eg. testing every single go using gradle task we have, if not testing that explicit case). A happier path would be that gogradle "just works" with go modules, and we can avoid the whole awkward double lockfile state, and version

Re: New contributor

2019-03-27 Thread Connell O'Callaghan
Welcome Niklas - given your background it will be very interesting to see your contributions. On Wed, Mar 27, 2019 at 10:29 AM Mark Liu wrote: > Welcome! > > Mark > > On Wed, Mar 27, 2019 at 10:09 AM Lukasz Cwik wrote: > >> Welcome. The getting started[1] and contribution guides[2] are most >>

Deprecating Avro for fastavro on Python 3

2019-03-27 Thread Robbe Sneyders
Hi all, We're looking at fixing avroio on Python 3, which still fails due to a non-picklable schema class in Avro [1]. This is fixed when using the latest Avro master, but the last release dates back to May 2017. Fastavro does not have the same problem, but is currently also failing due to a

Re: New contributor

2019-03-27 Thread Mark Liu
Welcome! Mark On Wed, Mar 27, 2019 at 10:09 AM Lukasz Cwik wrote: > Welcome. The getting started[1] and contribution guides[2] are most > useful. I have also added you as a contributor to the JIRA project. > > 1: https://beam.apache.org/get-started/beam-overview/ > 2:

[PROPOSAL] Standardize Gradle structure in Python SDK

2019-03-27 Thread Mark Liu
Hi Python SDK Developers, You may notice that Gradle files changed a lot recently as parallelization applied to Python tests and more python versions were enabled in testing. There are tricks over the build scripts and tests are grown

Re: New contributor

2019-03-27 Thread Lukasz Cwik
Welcome. The getting started[1] and contribution guides[2] are most useful. I have also added you as a contributor to the JIRA project. 1: https://beam.apache.org/get-started/beam-overview/ 2: https://beam.apache.org/contribute/ On Wed, Mar 27, 2019 at 9:38 AM Niklas Hansson <

Re: New contributor

2019-03-27 Thread Ahmet Altay
Welcome! You user name was already added to JIRA. On Wed, Mar 27, 2019 at 9:38 AM Niklas Hansson < niklas.sven.hans...@gmail.com> wrote: > Hi! > > I work as a data scientist within banking but will switch over to > manufacturing the next month. I would like to contribute to Beam and >

New contributor

2019-03-27 Thread Niklas Hansson
Hi! I work as a data scientist within banking but will switch over to manufacturing the next month. I would like to contribute to Beam and especially the Python SDK. Could you add me as a contributor? I am new to open source contribution so feel free to give me any advice or point me in the

Re: New contributor

2019-03-27 Thread Guobao Li
Thank you all you guys! On Tue, Mar 26, 2019 at 7:20 PM Connell O'Callaghan wrote: > Welcome Guobao!!! > > On Tue, Mar 26, 2019 at 11:09 AM Melissa Pashniak > wrote: > >> Welcome! >> >> >> On Tue, Mar 26, 2019 at 10:17 AM Kenneth Knowles wrote: >> >>> Welcome! Cool project. A lot of code, and