Very high memory consumption using KafkaIO

2024-06-04 Thread Sigalit Eliazov
Hi, We have a pipeline that reads data from kafka using kafkaIO, activates a simple transformation and writes the data back to Kafka. we are using flink runner with the following resources parallelism: 50 Task manager: 5 Memory per each: 20G Cpu: 10 This pipeline should handle around 400K

Re: On timer method are not triggred

2024-03-27 Thread Sigalit Eliazov
we are running with flink runner, i also tested with direct runner. same results On Wed, Mar 27, 2024 at 2:51 PM Sigalit Eliazov wrote: > hi, > this is the pipeline, very simple one > the onTimer is not fired. > We are not using any experimental variables. > > public class

Re: On timer method are not triggred

2024-03-27 Thread Sigalit Eliazov
On Wed, Mar 27, 2024 at 9:54 AM Jan Lukavský wrote: > Hi, > > what is your runner, is it Flink as well in the issue? What is the > source of your Pipeline? Do you use some additional flags, e.g. > --experiments? Do you see that using classical or portable runner? > > Jan > >

On timer method are not triggred

2024-03-26 Thread Sigalit Eliazov
Hi all We encountered issue with timers starting from version 2.52. We saw that the timers are not triggered. https://github.com/apache/beam/issues/29816 Did someone encounter such problems as well? Thanks Sigalit

Beam SQL JOIN with Side Inputs

2024-02-14 Thread Sigalit Eliazov
hi, We are currently working on a use case that involves streaming usage data arriving every 5 minutes, and we have a few dimension tables that undergo changes once a day. Our attempt to implement a join between these tables using Beam SQL encountered a limitation. Specifically, Beam SQL

Re: usage of dynamic schema in BEAM SQL

2024-01-28 Thread Sigalit Eliazov
u explain the use case a bit more? In order to write a SQL statement > (at least one that doesn't use wildcard selection) you also need to know > the schema ahead of time. What are you trying to accomplish with these > dynamic schemas? > > Reuven > > On Sun, Jan 28, 2024 at 2:

usage of dynamic schema in BEAM SQL

2024-01-28 Thread Sigalit Eliazov
Hello, In the upcoming process, we extract Avro messages from Kafka utilizing the Confluent Schema Registry. Our intention is to implement SQL queries on the streaming data. As far as I understand, since I am using the Flink runner, when creating the features PCollection, I must specify the

Re: Pipeline Stalls at GroupByKey Step

2023-11-16 Thread Sigalit Eliazov
Sigalit On Fri, Nov 17, 2023 at 4:36 AM Sachin Mittal wrote: > Do you add time stamp to every record you output in > ConvertFromKafkaRecord step or any step before that. > > On Fri, 17 Nov 2023 at 4:07 AM, Sigalit Eliazov > wrote: > >> Hi, >> >> In our

Pipeline Stalls at GroupByKey Step

2023-11-16 Thread Sigalit Eliazov
Hi, In our pipeline, we've encountered an issue with the GroupByKey step. After some time of running, it seems that messages are not progressing through the GroupByKey step, causing the pipeline to stall in data processing. To troubleshoot this issue, we added debug logging before and after the

"No fields were detected for class org.apache.beam.sdk.util.WindowedValue" - flink runner

2023-06-11 Thread Sigalit Eliazov
Hi all, I'm encountering an issue while deploying the pipeline to the Flink runner. The pipeline reads an Avro message from Kafka, applies a global window, performs a simple transformation, and then sends the data back to Kafka. The problem arises when I see the following error in the job

Re: state broadcasting in flink

2023-05-18 Thread Sigalit Eliazov
Hi, i suggest review https://beam.apache.org/blog/timely-processing/ hope this helps Sigalit On Thu, May 18, 2023 at 9:27 AM Zheng Ni wrote: > Hi There, > > Does beam support flink's state broadcasting feature mentioned in the link > below? if yes, is there any beam doc/example available? >

Re: major reduction is performance when using schema registry - KafkaIO

2023-04-13 Thread Sigalit Eliazov
r” >> instead of KafkaAvroDeserializer and AvroCoder? >> >> More details and an example how to use is here: >> >> https://beam.apache.org/releases/javadoc/2.46.0/org/apache/beam/sdk/io/kafka/KafkaIO.html >> (go >> to “Use Avro schema with Confluent Schema Regist

Re: major reduction is performance when using schema registry - KafkaIO

2023-04-09 Thread Sigalit Eliazov
; How are you using the schema registry? Do you have a code sample? > > On Sun, Apr 9, 2023 at 3:06 AM Sigalit Eliazov > wrote: > >> Hello, >> >> I am trying to understand the effect of schema registry on our pipeline's >> performance. In order to do sowe created a ve

major reduction is performance when using schema registry - KafkaIO

2023-04-09 Thread Sigalit Eliazov
Hello, I am trying to understand the effect of schema registry on our pipeline's performance. In order to do sowe created a very simple pipeline that reads from kafka, runs a simple transformation of adding new field and writes of kafka. the messages are in avro format I ran this pipeline with

Custom TimestampPolicy using event time

2023-03-27 Thread Sigalit Eliazov
Hi all, I am trying to implement a custom TimestampPolicy and use the event time instead of the processing time the flow is read from kafka using .withTimestampPolicyFactory((tp, previousWatermark) -> new EventTimestampPolicy(previousWatermark)); and the custom class with implementation i saw

RedisIO readKeyPatterns

2023-03-26 Thread Sigalit Eliazov
hi all, i have a pipeline that reads message from kafka, generates pcollection of keys (using Keys.create) and then access to redis with these keys: PCollection> redisOutput = pcollectionKeys.apply(RedisIO.readKeyPatterns().withEndpoint(PipelineUtil.REDIS_HOST, PipelineUtil.REDIS_PORT); When

clear State using business logic

2022-11-23 Thread Sigalit Eliazov
Hi all, the flow in our pipeline is: 1. read event X from kafka. open fixed window of 30 sec. 2. read event subscription from kafka. open GlobalWindow and store a state of all subscriptions. 3. match X and Y using key and if there is a match send an event to another kafka topic. (we use the

Re: read messages from kakfa: 2 different message types in kafka topic

2022-08-09 Thread Sigalit Eliazov
implement a similar DeserializerProvider for > Apicurio. You could also try using apicurio.registry.as-confluent, but > that requires to configure your producers accordingly. > > I any case, I suggest you study > ConfluentSchemaRegistryDeserializerProvider. That should lead you a path > for

read messages from kakfa: 2 different message types in kafka topic

2022-08-09 Thread Sigalit Eliazov
Hi all we have a single kafka topic which is used to receive 2 different types of messages. These 2 messages are Avro. So when reading messages from kafka i used the GenericRecord KafkaIO.read() .withBootstrapServers(bootstrapServers) .withTopic(topic)

sink triggers

2022-07-25 Thread Sigalit Eliazov
Hi all, I have a pipeline that reads input from a few sources, combines them and creates a view of the data. I need to send an output to kafka every X minutes. What will be the best way to implement this? Thanks Sigalit

snyc between two pcollection with different windows

2022-07-20 Thread Sigalit Eliazov
Hi all i need some advice regarding windows usage. i am sure this is a very basic question, any guidance will be very appreciated I am using: - unbounded pcollectionA with FixedWindow of 1 minute from which eventually i create state and use it as side input. PCollection> pcollectionA = x.

Re: KeyedBroadcastProcessFunction

2022-05-16 Thread Sigalit Eliazov
About 2gb and it should be distributed בתאריך יום א׳, 15 במאי 2022, 19:50, מאת Reuven Lax ‏: > How large is this state? Is it distributed? > > On Sun, May 15, 2022 at 8:12 AM Sigalit Eliazov > wrote: > >> Thanks for your response. >> The use case is 2 pipelines: >

Re: KeyedBroadcastProcessFunction

2022-05-15 Thread Sigalit Eliazov
the state was not available for the operator which reads from the scheduler . Thanks Sigalit On Sun, May 15, 2022 at 6:03 PM Reuven Lax wrote: > Beam supports side inputs, which might help you. Can you describe your use > case? > > On Sun, May 15, 2022 at 7:34 AM Sigalit Eliazov > wrote: >

KeyedBroadcastProcessFunction

2022-05-15 Thread Sigalit Eliazov
Hello does beam have support for something similar to KeyedBroadcastProcessFunction which exists in flink? I am looking for an option to have broadcast state in beam so it can be shared between different operators Thanks Sigalit

Re: session window question

2022-04-27 Thread Sigalit Eliazov
olding the last value seen per key in > global window. There is an implementation of this approach in > Deduplicate.KeyedValues [1]. > > Jan > > [1] > > https://beam.apache.org/releases/javadoc/2.38.0/org/apache/beam/sdk/transforms/Deduplicate.KeyedValues.html > > On 4/2

session window question

2022-04-27 Thread Sigalit Eliazov
Hi all i have the following scenario: a. a pipeline that reads messages from kafka and a session window with 1 minute duration. b. groupbykey in order to aggregate the data c. for each 'group' i do some calculation and build a new event to send to kafka. the output of this cycle is key1 - value1

KafkaIO consumer rate

2022-04-10 Thread Sigalit Eliazov
Hi all I saw a very low rate when message consuming from kafka in our different jobs. I order to find the bottleneck i created a very simple pipeline that reads string messages from kafka and just prints the output . The pipeline runs over flink cluster with the following setup: 1 task manager, 3

Deployment of beam pipelines on flink cluster

2022-02-17 Thread Sigalit Eliazov
Hi all, We are currently using beam to create a few pipelines, and then deploy them on our on-prem Flink cluster. we have a few questions regarding the automation of the pipelines deployment: We have beam running as a k8s pod which starts a java process for each pipeline and has the