[image: Beam.png]

September 2018 | Newsletter


What’s been done


CI improvement (by: Etienne Chauchot)

   -

   For each new commit on master Nexmark suite is run in both batch and
   streaming mode in Spark, Flink, Cloud Dataflow (thanks to Andrew) and
   dashboards graphs are produced to track functional and performance
   regressions.


Elasticsearch IO Supports Version 6 (by: Dat Tran)

   -

   Elasticsearch IO now supports version 6.x in addition to version 2.x and
   5.x.
   -

   See the merged PR
   <https://github.com/apache/beam/pull/6211#pullrequestreview-152477892>
   for more details.


KuduIO Added (by: Tim Robertson)

   -

   Apache Beam master now has KuduIO that will be released with Beam 2.7.0.
   -

   See BEAM-2661 <https://issues.apache.org/jira/browse/BEAM-2661> for more
   details.



What we’re working on...


Flink Portable Runner (by: Ankur Goenka, Maximilian Michels, Thomas Weise,
Ryan Williams)

   -

   Support for streaming side inputs merged

   -

   Portable Compatibility Matrix tests pass in streaming mode

   -

   Many more ValidatesRunner tests pass (ValidatesRunner is a comprehensive
   suite for Beam test pipelines)
   -

   Python Pipelines can be tested without bringing up a JobServer first (it
   is started in a container)
   -

   Experimental support for executing the SDK harnesses in a process
   instead of a Docker container
   -

   Bug fixes to Beam discovered during working on the portability

State and Timer Support in Python SDK (by: Charles Chen, Robert Bradshaw)

   -

   This change adds the reference DirectRunner implementation of the Python
   User State and Timers API. With this change, a user can execute DoFns with
   state and timers on the DirectRunner.
   -

   See the design doc
   <http://s.apache.org/beam-python-user-state-and-timers> and PR
   <https://github.com/apache/beam/pull/6304> for more details..


New IO - HadoopOutputFormatIO (by: Alexey Romanenko)

   -

   Adding support of MapReduce OutputFormat.
   -

   See BEAM-5310 <https://issues.apache.org/jira/browse/BEAM-5310> for more
   details.


High-level Java 8 DSL (by: David Moravek, Vaclav Plajt, Marek Simunek)

   -

   Adding high-level Java 8 DSL based on Euphoria API
   <https://github.com/seznam/euphoria> project
   -

   See BEAM-3900 <https://issues.apache.org/jira/browse/BEAM-3900> for more
   details.

Performance improvements for HDFS file writing operations (by: Tim
Robertson)

   -

   Autocreate directories when doing an HDFS rename
   -

   See PR <https://github.com/apache/beam/pull/6285> for more details


Recognition of non-code contributions (by: Gris Cuevas)

   -

   Got consensus about recognizing non-code contributions
   -

   See
   
<https://lists.apache.org/list.html?d...@beam.apache.org:lte=2y:non-code%20contributions>
   discussion for more details
   -

   Planned launch date: Beam Summit London (October 2nd)


Weekly Community Updates (by: Gris Cuevas)

   -

   Some of the project’s subcomponents run weekly updates in the mailing
   list, we’ll be consolidating best practices to share a weekly community
   update with all project related must knows in a shell



What’s planned


Beam Cookbook (by: Austin Bennett, David Cavazos, Gris Cuevas, Andrea
Foegler, Rose Nguyen, Connell O'Callaghan, and you!)

   -

   We are creating a cookbook for common data science tasks in Beam and
   have started brainstorming
   -

   We want to have a hackathon after the London Summit to generate content
   from the community
   -

   There will be a session at the summit to gather more ideas and input.
   Watch the dev and users mailing list for a call for contributions soon!.


Beam 2.7.0 release (by: Charles Chen)

Beam Mascot (by: Gris Cuevas & Community!)

   -

   We got approval to launch a contest to create a new Apache Beam mascot
   -

   See
   <https://lists.apache.org/list.html?d...@beam.apache.org:lte=2y:mascot>
   discussion for more details, if you’re interested in driving this, reach
   out in the thread!
   -

   Planned launch date: Last week of September



New Members


New Contributors

   -

   Đạt Trần, Ho Chi Minh City, Vietnam
   -

      See BEAM-5107
      <https://github.com/apache/beam/pull/6211#pullrequestreview-152477892>
      for more details on “Support ES-6.x for ElasticsearchIO”
      -

   Ravi Pathak, Copenhagen, Denmark
   -

      Using Beam for indexing open data on species at GBIF.org
      -

      Improving robustness of SolrIO


New Committers

   -

   Tim Robertson, Copenhagen, Denmark



Events, Talks & Meetups


[Coming Up] Beam Summit @ London, England

   -

   Organized by: Matthias Baetens, Victor Kotai, Alex Van Boxel & Gris
   Cuevas
   -

   The Beam Summit London 2018 will take place on October 1 and 2 in London.

   -

   If you’re interested in speaking reach out to g...@apache.org
   -

   More info can be found in the blog post
   <https://beam.apache.org/blog/2018/08/21/beam-summit-europe.html> and
   you can get your tickets on Eventbrite
   
<https://www.eventbrite.com/e/beam-summit-london-2018-tickets-49100625292#tickets>


[Coming Up] ApacheCon @ Montréal, Canada

   -

   Will take place Sep 24-27
   -

   Etienne Chauchot will give a talk on Universal Metrics with Beam
   <https://apachecon.dukecon.org/acna/2018/#/scheduledEvent/e22bd89bacbe03a36>
   -

   Alexey Romanenko and Ismaël Mejía will give a talk on Building portable
   and evolvable data-intensive applications with Apache
   <https://apachecon.dukecon.org/acna/2018/#/scheduledEvent/852e0eea165741042>
   -

   Ismaël Mejía and Eugene Kirpichov will give a talk on Robust, performant
   and modular APIs for data ingestion with Apache Beam
   <https://apachecon.dukecon.org/acna/2018/#/scheduledEvent/9d56e79f3c681c967>
   -

   Gris Cuevas will host a Birds of a Feather session on 9/26: Design
   Thinking to manage online communities in Open Source Projects… It’ll be a
   Beam get together, we’ll have food & Swag, join us!


[Coming Up] DataEngConf @ Barcelona, Spain

   -

   Will take place Sep 25-26
   -

   Maximilian Michels will give an introduction to Beam and its portability
   features
   <https://www.dataengconf.com/speaker?first_name=Maximilian&last_name=Michels>
   .


[Occurred] OSCON @ Portland, OR, USA (by: Holden Karau & Gris Cuevas)

   -

   Holden Karau gave a talk on TFT/TFMA + Beam on Flink (and other related
   adventures).
   -

   Watch the video here <https://youtu.be/ZGyx4GuGEj4> and see the slides
   here
   
<https://www.slideshare.net/hkarau/powering-tensorflow-with-big-data-using-apache-beam-flink-and-spark-oscon-pdx-2018>
   -

   Gris Cuevas gave a talk about active inclusion in Open Source
   
<https://conferences.oreilly.com/oscon/oscon-or-2018/public/schedule/detail/71408>,
   slides here
   
<https://docs.google.com/presentation/d/16-30Tmgls-iRGxFljL0PulC5hqo9UYDDmezJ392zI-k/edit>


[Occurred] Open Challenge @ Guadalajara, Mexico (by: OSoM, IBM & Google)

   -

   Arianne Navarro, Hector Paredes, Pablo Estrada & Gris Cuevas hosted a
   Hackathon for Apache Beam and BlueXolo, results include 3PR for Beam and 8
   Software Engineers introduced to Apache Beam


[Occurred] Open Source Summit @ Vancouver, Canada

   -

   Gris Cuevas gave a talk on active diversification in Open Source, slides
   here
   
<https://docs.google.com/presentation/d/1-ssTcOPF3FlorYmS-Ah8nmtBFbjwXAPf0j0sN0nOSok/edit?usp=sharing>
   -

   Ismael Mejia gave a talk on Apache Beam, see details here
   <http://sched.co/FAN6>


[Occurred] Flink Forward @ Berlin, Germany

   -

   Robert Bradshaw and Maximilian Michels gave talk on Universal Machine
   Learning with Apache Beam, schedule
   
<https://berlin-2018.flink-forward.org/conference-program/#universal-machine-learning-with-apache-beam>,
   slides
   
<https://docs.google.com/presentation/d/1U5h45drW7QEMBTuLIVlrKwMMbu8mXE2BQyBkA_5O3ak/edit#slide=id.gc6fa3c898_0_0>
   -

   Aljoscha Krettek and Thomas Weise Python Streaming Pipelines with Beam
   on Flink, schedule
   
<https://berlin-2018.flink-forward.org/conference-program/#python-streaming-pipelines-with-beam-on-flink>,
   slides <https://s.apache.org/streaming-python-beam-flink>



Resources


Setting up a Java Development Env Beam on GCP (by: Jacob Ferriero)

   -

   This post will help you get a development environment up and running to
   start developing Java Dataflow jobs. By the end you’ll be able to run an
   Apache Beam locally in debug mode, execute code in a REPL to speed your
   development cycles, and submit your job to Google Cloud Dataflow. Medium
   Post
   
<https://medium.com/google-cloud/setting-up-a-java-development-environment-for-apache-beam-on-google-cloud-platform-ec0c6c9fbb39>.


Coding Apache Beam in your Web Browser (by: Daniel De Leo)

   -

   But what happens when you’re on the go on a computer which doesn’t
   support your IDE of choice, or you’re using someone else’s computer and
   need to develop Apache Beam pipelines? Google has you covered! Google’s
   Cloud Shell <https://cloud.google.com/shell/docs/features> comes with a
   built-in Code Editor for developing/modifying code (it’s based on Eclipse’s
   Orion). It’s not as full featured as an IDE but it does beat using Vim or
   Emacs to edit code! Medium Post
   
<https://medium.com/google-cloud/coding-apache-beam-in-your-web-browser-and-running-it-in-cloud-dataflow-c41c275d42c8>.


Building a real time quant trading engine on Dataflow and Beam (by: Lei He)

   -

   In this post, we are going to build a data pipeline that analyzes real
   time stock tick data streamed from gCloud Pub/Sub, runs them through a pair
   correlation trading algorithm, and outputs trading signals onto Pub/Sub for
   execution. Medium Post
   
<https://medium.com/google-cloud/building-a-real-time-quant-trading-engine-on-google-cloud-dataflow-and-apache-beam-841a909d2c12>.


Apache Beam: Reading from S3 and writing to BigQuery (by: Asa Harland)

   -

   In this article we look at how we can use Apache Beam to extract data
   from AWS S3 (or Google Cloud Storage), run some aggregations over the data
   and store the result in BigQuery. Medium Post
   
<https://medium.com/@asajharland/using-apache-beam-to-read-data-from-aws-s3-and-write-to-google-bigquery-3ccd163d12c4>.



Apache Beam Events & Meetups

   -

   Join our Slack channel
   
<https://the-asf.slack.com/messages/CA8D5DPHQ/convo/CA8D5DPHQ-1534357996.000100/>!



*Until Next Time!*

*This edition was curated by our community of contributors, committers and
PMCs. It contains work done in August 2018 and ongoing efforts. We hope to
provide visibility to what's going on in the community, so if you have
questions, feel free to ask in this thread. *
-- 
Rose Thị Nguyễn

Reply via email to