Hello Beamers, I'm happy to announce that the portable Spark runner is now mostly feature-complete [0] for BATCH processing (STREAMING is not yet available). This means you can run your new or existing Beam Python and Go pipelines using Apache Spark as the underlying execution engine.
"Portable," you ask? Essentially, it shares a lot of the same code as the existing Spark runner, but also leverages Beam's portability APIs [1] to add Python and Go support, in addition to Java (note that the Go SDK itself is still considered experimental [2]). Instructions on how to run pipelines on the portable Spark runner are available on the Beam website [3]. While we are passing Beam's fairly comprehensive test suites [4][5][6], the portable Spark runner has yet to be tested in a production environment, so please feel free to file a Jira and tag me if you have issues or feature requests (username: ibzib). Thanks, Kyle [0] https://s.apache.org/apache-beam-portability-support-table [1] https://beam.apache.org/roadmap/portability/ [2] https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E [3] https://beam.apache.org/documentation/runners/spark/ [4] https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch [5] https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ [6] https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/ Kyle Weaver | Software Engineer | github.com/ibzib | [email protected] | +16502035555
