[image: Beam.png] September 2018 | Newsletter
What’s been done CI improvement (by: Etienne Chauchot) - For each new commit on master Nexmark suite is run in both batch and streaming mode in Spark, Flink, Cloud Dataflow (thanks to Andrew) and dashboards graphs are produced to track functional and performance regressions. Elasticsearch IO Supports Version 6 (by: Dat Tran) - Elasticsearch IO now supports version 6.x in addition to version 2.x and 5.x. - See the merged PR <https://github.com/apache/beam/pull/6211#pullrequestreview-152477892> for more details. KuduIO Added (by: Tim Robertson) - Apache Beam master now has KuduIO that will be released with Beam 2.7.0. - See BEAM-2661 <https://issues.apache.org/jira/browse/BEAM-2661> for more details. What we’re working on... Flink Portable Runner (by: Ankur Goenka, Maximilian Michels, Thomas Weise, Ryan Williams) - Support for streaming side inputs merged - Portable Compatibility Matrix tests pass in streaming mode - Many more ValidatesRunner tests pass (ValidatesRunner is a comprehensive suite for Beam test pipelines) - Python Pipelines can be tested without bringing up a JobServer first (it is started in a container) - Experimental support for executing the SDK harnesses in a process instead of a Docker container - Bug fixes to Beam discovered during working on the portability State and Timer Support in Python SDK (by: Charles Chen, Robert Bradshaw) - This change adds the reference DirectRunner implementation of the Python User State and Timers API. With this change, a user can execute DoFns with state and timers on the DirectRunner. - See the design doc <http://s.apache.org/beam-python-user-state-and-timers> and PR <https://github.com/apache/beam/pull/6304> for more details.. New IO - HadoopOutputFormatIO (by: Alexey Romanenko) - Adding support of MapReduce OutputFormat. - See BEAM-5310 <https://issues.apache.org/jira/browse/BEAM-5310> for more details. High-level Java 8 DSL (by: David Moravek, Vaclav Plajt, Marek Simunek) - Adding high-level Java 8 DSL based on Euphoria API <https://github.com/seznam/euphoria> project - See BEAM-3900 <https://issues.apache.org/jira/browse/BEAM-3900> for more details. Performance improvements for HDFS file writing operations (by: Tim Robertson) - Autocreate directories when doing an HDFS rename - See PR <https://github.com/apache/beam/pull/6285> for more details Recognition of non-code contributions (by: Gris Cuevas) - Got consensus about recognizing non-code contributions - See <https://lists.apache.org/list.html?d...@beam.apache.org:lte=2y:non-code%20contributions> discussion for more details - Planned launch date: Beam Summit London (October 2nd) Weekly Community Updates (by: Gris Cuevas) - Some of the project’s subcomponents run weekly updates in the mailing list, we’ll be consolidating best practices to share a weekly community update with all project related must knows in a shell What’s planned Beam Cookbook (by: Austin Bennett, David Cavazos, Gris Cuevas, Andrea Foegler, Rose Nguyen, Connell O'Callaghan, and you!) - We are creating a cookbook for common data science tasks in Beam and have started brainstorming - We want to have a hackathon after the London Summit to generate content from the community - There will be a session at the summit to gather more ideas and input. Watch the dev and users mailing list for a call for contributions soon!. Beam 2.7.0 release (by: Charles Chen) Beam Mascot (by: Gris Cuevas & Community!) - We got approval to launch a contest to create a new Apache Beam mascot - See <https://lists.apache.org/list.html?d...@beam.apache.org:lte=2y:mascot> discussion for more details, if you’re interested in driving this, reach out in the thread! - Planned launch date: Last week of September New Members New Contributors - Đạt Trần, Ho Chi Minh City, Vietnam - See BEAM-5107 <https://github.com/apache/beam/pull/6211#pullrequestreview-152477892> for more details on “Support ES-6.x for ElasticsearchIO” - Ravi Pathak, Copenhagen, Denmark - Using Beam for indexing open data on species at GBIF.org - Improving robustness of SolrIO New Committers - Tim Robertson, Copenhagen, Denmark Events, Talks & Meetups [Coming Up] Beam Summit @ London, England - Organized by: Matthias Baetens, Victor Kotai, Alex Van Boxel & Gris Cuevas - The Beam Summit London 2018 will take place on October 1 and 2 in London. - If you’re interested in speaking reach out to g...@apache.org - More info can be found in the blog post <https://beam.apache.org/blog/2018/08/21/beam-summit-europe.html> and you can get your tickets on Eventbrite <https://www.eventbrite.com/e/beam-summit-london-2018-tickets-49100625292#tickets> [Coming Up] ApacheCon @ Montréal, Canada - Will take place Sep 24-27 - Etienne Chauchot will give a talk on Universal Metrics with Beam <https://apachecon.dukecon.org/acna/2018/#/scheduledEvent/e22bd89bacbe03a36> - Alexey Romanenko and Ismaël Mejía will give a talk on Building portable and evolvable data-intensive applications with Apache <https://apachecon.dukecon.org/acna/2018/#/scheduledEvent/852e0eea165741042> - Ismaël Mejía and Eugene Kirpichov will give a talk on Robust, performant and modular APIs for data ingestion with Apache Beam <https://apachecon.dukecon.org/acna/2018/#/scheduledEvent/9d56e79f3c681c967> - Gris Cuevas will host a Birds of a Feather session on 9/26: Design Thinking to manage online communities in Open Source Projects… It’ll be a Beam get together, we’ll have food & Swag, join us! [Coming Up] DataEngConf @ Barcelona, Spain - Will take place Sep 25-26 - Maximilian Michels will give an introduction to Beam and its portability features <https://www.dataengconf.com/speaker?first_name=Maximilian&last_name=Michels> . [Occurred] OSCON @ Portland, OR, USA (by: Holden Karau & Gris Cuevas) - Holden Karau gave a talk on TFT/TFMA + Beam on Flink (and other related adventures). - Watch the video here <https://youtu.be/ZGyx4GuGEj4> and see the slides here <https://www.slideshare.net/hkarau/powering-tensorflow-with-big-data-using-apache-beam-flink-and-spark-oscon-pdx-2018> - Gris Cuevas gave a talk about active inclusion in Open Source <https://conferences.oreilly.com/oscon/oscon-or-2018/public/schedule/detail/71408>, slides here <https://docs.google.com/presentation/d/16-30Tmgls-iRGxFljL0PulC5hqo9UYDDmezJ392zI-k/edit> [Occurred] Open Challenge @ Guadalajara, Mexico (by: OSoM, IBM & Google) - Arianne Navarro, Hector Paredes, Pablo Estrada & Gris Cuevas hosted a Hackathon for Apache Beam and BlueXolo, results include 3PR for Beam and 8 Software Engineers introduced to Apache Beam [Occurred] Open Source Summit @ Vancouver, Canada - Gris Cuevas gave a talk on active diversification in Open Source, slides here <https://docs.google.com/presentation/d/1-ssTcOPF3FlorYmS-Ah8nmtBFbjwXAPf0j0sN0nOSok/edit?usp=sharing> - Ismael Mejia gave a talk on Apache Beam, see details here <http://sched.co/FAN6> [Occurred] Flink Forward @ Berlin, Germany - Robert Bradshaw and Maximilian Michels gave talk on Universal Machine Learning with Apache Beam, schedule <https://berlin-2018.flink-forward.org/conference-program/#universal-machine-learning-with-apache-beam>, slides <https://docs.google.com/presentation/d/1U5h45drW7QEMBTuLIVlrKwMMbu8mXE2BQyBkA_5O3ak/edit#slide=id.gc6fa3c898_0_0> - Aljoscha Krettek and Thomas Weise Python Streaming Pipelines with Beam on Flink, schedule <https://berlin-2018.flink-forward.org/conference-program/#python-streaming-pipelines-with-beam-on-flink>, slides <https://s.apache.org/streaming-python-beam-flink> Resources Setting up a Java Development Env Beam on GCP (by: Jacob Ferriero) - This post will help you get a development environment up and running to start developing Java Dataflow jobs. By the end you’ll be able to run an Apache Beam locally in debug mode, execute code in a REPL to speed your development cycles, and submit your job to Google Cloud Dataflow. Medium Post <https://medium.com/google-cloud/setting-up-a-java-development-environment-for-apache-beam-on-google-cloud-platform-ec0c6c9fbb39>. Coding Apache Beam in your Web Browser (by: Daniel De Leo) - But what happens when you’re on the go on a computer which doesn’t support your IDE of choice, or you’re using someone else’s computer and need to develop Apache Beam pipelines? Google has you covered! Google’s Cloud Shell <https://cloud.google.com/shell/docs/features> comes with a built-in Code Editor for developing/modifying code (it’s based on Eclipse’s Orion). It’s not as full featured as an IDE but it does beat using Vim or Emacs to edit code! Medium Post <https://medium.com/google-cloud/coding-apache-beam-in-your-web-browser-and-running-it-in-cloud-dataflow-c41c275d42c8>. Building a real time quant trading engine on Dataflow and Beam (by: Lei He) - In this post, we are going to build a data pipeline that analyzes real time stock tick data streamed from gCloud Pub/Sub, runs them through a pair correlation trading algorithm, and outputs trading signals onto Pub/Sub for execution. Medium Post <https://medium.com/google-cloud/building-a-real-time-quant-trading-engine-on-google-cloud-dataflow-and-apache-beam-841a909d2c12>. Apache Beam: Reading from S3 and writing to BigQuery (by: Asa Harland) - In this article we look at how we can use Apache Beam to extract data from AWS S3 (or Google Cloud Storage), run some aggregations over the data and store the result in BigQuery. Medium Post <https://medium.com/@asajharland/using-apache-beam-to-read-data-from-aws-s3-and-write-to-google-bigquery-3ccd163d12c4>. Apache Beam Events & Meetups - Join our Slack channel <https://the-asf.slack.com/messages/CA8D5DPHQ/convo/CA8D5DPHQ-1534357996.000100/>! *Until Next Time!* *This edition was curated by our community of contributors, committers and PMCs. It contains work done in August 2018 and ongoing efforts. We hope to provide visibility to what's going on in the community, so if you have questions, feel free to ask in this thread. * -- Rose Thị Nguyễn