[ANNOUNCE] Spark portable runner (batch) now available for Java, Python, Go

Kyle Weaver Fri, 14 Jun 2019 14:03:16 -0700

Hello Beamers,

I'm happy to announce that the portable Spark runner is now mostly
feature-complete [0] for BATCH processing (STREAMING is not yet available).
This means you can run your new or existing Beam Python and Go pipelines
using Apache Spark as the underlying execution engine.


"Portable," you ask? Essentially, it shares a lot of the same code as the
existing Spark runner, but also leverages Beam's portability APIs [1] to
add Python and Go support, in addition to Java (note that the Go SDK itself
is still considered experimental [2]).

Instructions on how to run pipelines on the portable Spark runner are
available on the Beam website [3].

While we are passing Beam's fairly comprehensive test suites [4][5][6], the
portable Spark runner has yet to be tested in a production environment, so
please feel free to file a Jira and tag me if you have issues or feature
requests (username: ibzib).

Thanks,
Kyle

[0] https://s.apache.org/apache-beam-portability-support-table
[1] https://beam.apache.org/roadmap/portability/
[2]
https://lists.apache.org/thread.html/8f729da2d3009059d7a8b2d8624446be161700dcfa953939dd3530c6@%3Cdev.beam.apache.org%3E
[3] https://beam.apache.org/documentation/runners/spark/
[4] https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch
[5] https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/
[6] https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/

Kyle Weaver | Software Engineer | github.com/ibzib | [email protected] |
+16502035555

[ANNOUNCE] Spark portable runner (batch) now available for Java, Python, Go

Reply via email to