Re: Specific use-case question - Kafka-to-GCS-avro-Python

2024-03-26 Thread Ondřej Pánek
-case question - Kafka-to-GCS-avro-Python My attempt to fix https://github.com/apache/beam/issues/25598: https://github.com/apache/beam/pull/30728 On Thu, Mar 21, 2024 at 10:35 AM Ondřej Pánek mailto:ondrej.pa...@bighub.cz>> wrote: Hello Jan, thanks a lot for for the detailed answer! So

Offset access in Kafka messages in Python

2024-03-26 Thread Ondřej Pánek
Hello team, Is it possible to somehow retrieve metadata like topic, partition and offset information from the consumed messages from Kafka source? I mean, if that’s possible to do so in Python. I understand, that in Java, there is the KafkaRecord construct, which offers these metadata, but I ha

Re: Specific use-case question - Kafka-to-GCS-avro-Python

2024-03-21 Thread Ondřej Pánek
multi-language-pipelines/ On 3/14/24 09:34, Ondřej Pánek wrote: Basically, this is the error we receive when trying to use avro or parquet sinks (attached image). Also, check the sample pipeline that triggers this error (when deploying with DataflowRunner). So obviously, there is no global w

Re: Specific use-case question - Kafka-to-GCS-avro-Python

2024-03-14 Thread Ondřej Pánek
the issue: https://github.com/apache/beam/issues/25598 From: Ondřej Pánek Date: Thursday, March 14, 2024 at 07:57 To: user@beam.apache.org Subject: Re: Specific use-case question - Kafka-to-GCS-avro-Python Hello, thanks for the reply! Please, refer to these: * https

Re: Specific use-case question - Kafka-to-GCS-avro-Python

2024-03-13 Thread Ondřej Pánek
in Python. On Wed, Mar 13, 2024 at 5:35 PM Ondřej Pánek mailto:ondrej.pa...@bighub.cz>> wrote: Hello Beam team! We’re currently onboarding customer’s infrastructure to the Google Cloud Platform. The decision was made that one of the technologies they will use is Dataflow. Let me bri

Specific use-case question - Kafka-to-GCS-avro-Python

2024-03-13 Thread Ondřej Pánek
Hello Beam team! We’re currently onboarding customer’s infrastructure to the Google Cloud Platform. The decision was made that one of the technologies they will use is Dataflow. Let me briefly the usecase specification: They have kafka cluster where data from CDC data source is stored. The data