[image: Beam.png]

September 2018 | Newsletter

What’s been done

CI improvement (by: Etienne Chauchot)


   For each new commit on master Nexmark suite is run in both batch and
   streaming mode in Spark, Flink, Cloud Dataflow (thanks to Andrew) and
   dashboards graphs are produced to track functional and performance

Elasticsearch IO Supports Version 6 (by: Dat Tran)


   Elasticsearch IO now supports version 6.x in addition to version 2.x and

   See the merged PR
   for more details.

KuduIO Added (by: Tim Robertson)


   Apache Beam master now has KuduIO that will be released with Beam 2.7.0.

   See BEAM-2661 <https://issues.apache.org/jira/browse/BEAM-2661> for more

What we’re working on...

Flink Portable Runner (by: Ankur Goenka, Maximilian Michels, Thomas Weise,
Ryan Williams)


   Support for streaming side inputs merged


   Portable Compatibility Matrix tests pass in streaming mode


   Many more ValidatesRunner tests pass (ValidatesRunner is a comprehensive
   suite for Beam test pipelines)

   Python Pipelines can be tested without bringing up a JobServer first (it
   is started in a container)

   Experimental support for executing the SDK harnesses in a process
   instead of a Docker container

   Bug fixes to Beam discovered during working on the portability

State and Timer Support in Python SDK (by: Charles Chen, Robert Bradshaw)


   This change adds the reference DirectRunner implementation of the Python
   User State and Timers API. With this change, a user can execute DoFns with
   state and timers on the DirectRunner.

   See the design doc
   <http://s.apache.org/beam-python-user-state-and-timers> and PR
   <https://github.com/apache/beam/pull/6304> for more details..

New IO - HadoopOutputFormatIO (by: Alexey Romanenko)


   Adding support of MapReduce OutputFormat.

   See BEAM-5310 <https://issues.apache.org/jira/browse/BEAM-5310> for more

High-level Java 8 DSL (by: David Moravek, Vaclav Plajt, Marek Simunek)


   Adding high-level Java 8 DSL based on Euphoria API
   <https://github.com/seznam/euphoria> project

   See BEAM-3900 <https://issues.apache.org/jira/browse/BEAM-3900> for more

Performance improvements for HDFS file writing operations (by: Tim


   Autocreate directories when doing an HDFS rename

   See PR <https://github.com/apache/beam/pull/6285> for more details

Recognition of non-code contributions (by: Gris Cuevas)


   Got consensus about recognizing non-code contributions

   discussion for more details

   Planned launch date: Beam Summit London (October 2nd)

Weekly Community Updates (by: Gris Cuevas)


   Some of the project’s subcomponents run weekly updates in the mailing
   list, we’ll be consolidating best practices to share a weekly community
   update with all project related must knows in a shell

What’s planned

Beam Cookbook (by: Austin Bennett, David Cavazos, Gris Cuevas, Andrea
Foegler, Rose Nguyen, Connell O'Callaghan, and you!)


   We are creating a cookbook for common data science tasks in Beam and
   have started brainstorming

   We want to have a hackathon after the London Summit to generate content
   from the community

   There will be a session at the summit to gather more ideas and input.
   Watch the dev and users mailing list for a call for contributions soon!.

Beam 2.7.0 release (by: Charles Chen)

Beam Mascot (by: Gris Cuevas & Community!)


   We got approval to launch a contest to create a new Apache Beam mascot

   discussion for more details, if you’re interested in driving this, reach
   out in the thread!

   Planned launch date: Last week of September

New Members

New Contributors


   Đạt Trần, Ho Chi Minh City, Vietnam

      See BEAM-5107
      for more details on “Support ES-6.x for ElasticsearchIO”

   Ravi Pathak, Copenhagen, Denmark

      Using Beam for indexing open data on species at GBIF.org

      Improving robustness of SolrIO

New Committers


   Tim Robertson, Copenhagen, Denmark

Events, Talks & Meetups

[Coming Up] Beam Summit @ London, England


   Organized by: Matthias Baetens, Victor Kotai, Alex Van Boxel & Gris

   The Beam Summit London 2018 will take place on October 1 and 2 in London.


   If you’re interested in speaking reach out to g...@apache.org

   More info can be found in the blog post
   <https://beam.apache.org/blog/2018/08/21/beam-summit-europe.html> and
   you can get your tickets on Eventbrite

[Coming Up] ApacheCon @ Montréal, Canada


   Will take place Sep 24-27

   Etienne Chauchot will give a talk on Universal Metrics with Beam

   Alexey Romanenko and Ismaël Mejía will give a talk on Building portable
   and evolvable data-intensive applications with Apache

   Ismaël Mejía and Eugene Kirpichov will give a talk on Robust, performant
   and modular APIs for data ingestion with Apache Beam

   Gris Cuevas will host a Birds of a Feather session on 9/26: Design
   Thinking to manage online communities in Open Source Projects… It’ll be a
   Beam get together, we’ll have food & Swag, join us!

[Coming Up] DataEngConf @ Barcelona, Spain


   Will take place Sep 25-26

   Maximilian Michels will give an introduction to Beam and its portability

[Occurred] OSCON @ Portland, OR, USA (by: Holden Karau & Gris Cuevas)


   Holden Karau gave a talk on TFT/TFMA + Beam on Flink (and other related

   Watch the video here <https://youtu.be/ZGyx4GuGEj4> and see the slides

   Gris Cuevas gave a talk about active inclusion in Open Source
   slides here

[Occurred] Open Challenge @ Guadalajara, Mexico (by: OSoM, IBM & Google)


   Arianne Navarro, Hector Paredes, Pablo Estrada & Gris Cuevas hosted a
   Hackathon for Apache Beam and BlueXolo, results include 3PR for Beam and 8
   Software Engineers introduced to Apache Beam

[Occurred] Open Source Summit @ Vancouver, Canada


   Gris Cuevas gave a talk on active diversification in Open Source, slides

   Ismael Mejia gave a talk on Apache Beam, see details here

[Occurred] Flink Forward @ Berlin, Germany


   Robert Bradshaw and Maximilian Michels gave talk on Universal Machine
   Learning with Apache Beam, schedule

   Aljoscha Krettek and Thomas Weise Python Streaming Pipelines with Beam
   on Flink, schedule
   slides <https://s.apache.org/streaming-python-beam-flink>


Setting up a Java Development Env Beam on GCP (by: Jacob Ferriero)


   This post will help you get a development environment up and running to
   start developing Java Dataflow jobs. By the end you’ll be able to run an
   Apache Beam locally in debug mode, execute code in a REPL to speed your
   development cycles, and submit your job to Google Cloud Dataflow. Medium

Coding Apache Beam in your Web Browser (by: Daniel De Leo)


   But what happens when you’re on the go on a computer which doesn’t
   support your IDE of choice, or you’re using someone else’s computer and
   need to develop Apache Beam pipelines? Google has you covered! Google’s
   Cloud Shell <https://cloud.google.com/shell/docs/features> comes with a
   built-in Code Editor for developing/modifying code (it’s based on Eclipse’s
   Orion). It’s not as full featured as an IDE but it does beat using Vim or
   Emacs to edit code! Medium Post

Building a real time quant trading engine on Dataflow and Beam (by: Lei He)


   In this post, we are going to build a data pipeline that analyzes real
   time stock tick data streamed from gCloud Pub/Sub, runs them through a pair
   correlation trading algorithm, and outputs trading signals onto Pub/Sub for
   execution. Medium Post

Apache Beam: Reading from S3 and writing to BigQuery (by: Asa Harland)


   In this article we look at how we can use Apache Beam to extract data
   from AWS S3 (or Google Cloud Storage), run some aggregations over the data
   and store the result in BigQuery. Medium Post

Apache Beam Events & Meetups


   Join our Slack channel

*Until Next Time!*

*This edition was curated by our community of contributors, committers and
PMCs. It contains work done in August 2018 and ongoing efforts. We hope to
provide visibility to what's going on in the community, so if you have
questions, feel free to ask in this thread. *
Rose Thị Nguyễn

Reply via email to