We are happy to present the new 2.53.0 release of Beam. This release includes both improvements and new functionality. For more information on changes in 2.53.0, check out the detailed release notes (https://github.com/apache/beam/milestone/17).
Highlights * Python streaming users that use 2.47.0 and newer versions of Beam should update to version 2.53.0, which fixes a known issue ( https://github.com/apache/beam/issues/27330) I/Os * TextIO now supports skipping multiple header lines (Java)( https://github.com/apache/beam/issues/17990) * Python GCSIO is now implemented with GCP GCS Client instead of apitools ( https://github.com/apache/beam/issues/25676) * Adding support for LowCardinality DataType in ClickHouse (Java) ( https://github.com/apache/beam/pull/29533) * Added support for handling bad records to KafkaIO (Java) ( https://github.com/apache/beam/pull/29546) * Add support for generating text embeddings in MLTransform for Vertex AI and Hugging Face Hub models. (https://github.com/apache/beam/pull/29564) * NATS IO connector added (Go) (https://github.com/apache/beam/issues/29000) New Features / Improvements * The Python SDK now type checks `collections.abc.Collections` types properly. Some type hints that were erroneously allowed by the SDK may now fail (https://github.com/apache/beam/pull/29272) * Running multi-language pipelines locally no longer requires Docker. Instead, the same (generally auto-started) subprocess used to perform the expansion can also be used as the cross-language worker. * Framework for adding Error Handlers to composite transforms added in Java (https://github.com/apache/beam/pull/29164) * Python 3.11 images now include google-cloud-profiler ( https://github.com/apache/beam/pull/29651) Breaking Changes * Upgraded to go 1.21.5 to build, fixing CVE-2023-45285 ( https://security-tracker.debian.org/tracker/CVE-2023-45285) and CVE-2023-39326 (https://security-tracker.debian.org/tracker/CVE-2023-39326) Deprecations * Euphoria DSL is deprecated and will be removed in a future release (not before 2.56.0) (https://github.com/apache/beam/issues/29451) Bugfixes * (Python) Fixed sporadic crashes in streaming pipelines that affected some users of 2.47.0 and newer SDKs (https://github.com/apache/beam/issues/27330) * (Python) Fixed a bug that caused MLTransform to drop identical elements in the output PCollection (https://github.com/apache/beam/issues/29600) Thanks, Jack McCluskey -- Jack McCluskey SWE - DataPLS PLAT/ Dataflow ML RDU jrmcclus...@google.com