Hi,
We have a pipeline that reads data from kafka using kafkaIO, activates a
simple transformation and writes the data back to Kafka.
we are using flink runner with the following resources
parallelism: 50
Task manager: 5
Memory per each: 20G
Cpu: 10
This pipeline should handle around 400K
we are running with flink runner, i also tested with direct runner. same
results
On Wed, Mar 27, 2024 at 2:51 PM Sigalit Eliazov wrote:
> hi,
> this is the pipeline, very simple one
> the onTimer is not fired.
> We are not using any experimental variables.
>
> public class
On Wed, Mar 27, 2024 at 9:54 AM Jan Lukavský wrote:
> Hi,
>
> what is your runner, is it Flink as well in the issue? What is the
> source of your Pipeline? Do you use some additional flags, e.g.
> --experiments? Do you see that using classical or portable runner?
>
> Jan
>
>
Hi all
We encountered issue with timers starting from version 2.52.
We saw that the timers are not triggered.
https://github.com/apache/beam/issues/29816
Did someone encounter such problems as well?
Thanks
Sigalit
hi,
We are currently working on a use case that involves streaming usage data
arriving every 5 minutes, and we have a few dimension tables that undergo
changes once a day.
Our attempt to implement a join between these tables using Beam SQL
encountered a limitation. Specifically, Beam SQL
u explain the use case a bit more? In order to write a SQL statement
> (at least one that doesn't use wildcard selection) you also need to know
> the schema ahead of time. What are you trying to accomplish with these
> dynamic schemas?
>
> Reuven
>
> On Sun, Jan 28, 2024 at 2:
Hello, In the upcoming process, we extract Avro messages from Kafka
utilizing the Confluent Schema Registry.
Our intention is to implement SQL queries on the streaming data.
As far as I understand, since I am using the Flink runner, when
creating the features PCollection, I must specify the
Sigalit
On Fri, Nov 17, 2023 at 4:36 AM Sachin Mittal wrote:
> Do you add time stamp to every record you output in
> ConvertFromKafkaRecord step or any step before that.
>
> On Fri, 17 Nov 2023 at 4:07 AM, Sigalit Eliazov
> wrote:
>
>> Hi,
>>
>> In our
Hi,
In our pipeline, we've encountered an issue with the GroupByKey step. After
some time of running, it seems that messages are not progressing through
the GroupByKey step, causing the pipeline to stall in data processing.
To troubleshoot this issue, we added debug logging before and after the
Hi all,
I'm encountering an issue while deploying the pipeline to the Flink runner.
The pipeline reads an Avro message from Kafka, applies a global window,
performs a simple transformation, and then sends the data back to Kafka.
The problem arises when I see the following error in the job
Hi,
i suggest review
https://beam.apache.org/blog/timely-processing/
hope this helps
Sigalit
On Thu, May 18, 2023 at 9:27 AM Zheng Ni wrote:
> Hi There,
>
> Does beam support flink's state broadcasting feature mentioned in the link
> below? if yes, is there any beam doc/example available?
>
r”
>> instead of KafkaAvroDeserializer and AvroCoder?
>>
>> More details and an example how to use is here:
>>
>> https://beam.apache.org/releases/javadoc/2.46.0/org/apache/beam/sdk/io/kafka/KafkaIO.html
>> (go
>> to “Use Avro schema with Confluent Schema Regist
; How are you using the schema registry? Do you have a code sample?
>
> On Sun, Apr 9, 2023 at 3:06 AM Sigalit Eliazov
> wrote:
>
>> Hello,
>>
>> I am trying to understand the effect of schema registry on our pipeline's
>> performance. In order to do sowe created a ve
Hello,
I am trying to understand the effect of schema registry on our pipeline's
performance. In order to do sowe created a very simple pipeline that reads
from kafka, runs a simple transformation of adding new field and writes of
kafka. the messages are in avro format
I ran this pipeline with
Hi all,
I am trying to implement a custom TimestampPolicy and use the event
time instead of the processing time
the flow is read from kafka using
.withTimestampPolicyFactory((tp, previousWatermark) -> new
EventTimestampPolicy(previousWatermark));
and the custom class with implementation i saw
hi all,
i have a pipeline that reads message from kafka, generates pcollection
of keys (using Keys.create)
and then access to redis with these keys:
PCollection> redisOutput =
pcollectionKeys.apply(RedisIO.readKeyPatterns().withEndpoint(PipelineUtil.REDIS_HOST,
PipelineUtil.REDIS_PORT);
When
Hi all,
the flow in our pipeline is:
1. read event X from kafka. open fixed window of 30 sec.
2. read event subscription from kafka. open GlobalWindow and store a
state of all subscriptions.
3. match X and Y using key and if there is a match send an event to
another kafka topic. (we use the
implement a similar DeserializerProvider for
> Apicurio. You could also try using apicurio.registry.as-confluent, but
> that requires to configure your producers accordingly.
>
> I any case, I suggest you study
> ConfluentSchemaRegistryDeserializerProvider. That should lead you a path
> for
Hi all
we have a single kafka topic which is used to receive 2 different types of
messages.
These 2 messages are Avro.
So when reading messages from kafka i used the GenericRecord
KafkaIO.read()
.withBootstrapServers(bootstrapServers)
.withTopic(topic)
Hi all,
I have a pipeline that reads input from a few sources, combines them and
creates a view of the data.
I need to send an output to kafka every X minutes.
What will be the best way to implement this?
Thanks
Sigalit
Hi all
i need some advice regarding windows usage. i am sure this is a very basic
question, any guidance will be very appreciated
I am using:
- unbounded pcollectionA with FixedWindow of 1 minute from which eventually
i create state and use it as side input.
PCollection> pcollectionA = x.
About 2gb and it should be distributed
בתאריך יום א׳, 15 במאי 2022, 19:50, מאת Reuven Lax :
> How large is this state? Is it distributed?
>
> On Sun, May 15, 2022 at 8:12 AM Sigalit Eliazov
> wrote:
>
>> Thanks for your response.
>> The use case is 2 pipelines:
>
the state was not available for the operator which reads
from the scheduler .
Thanks
Sigalit
On Sun, May 15, 2022 at 6:03 PM Reuven Lax wrote:
> Beam supports side inputs, which might help you. Can you describe your use
> case?
>
> On Sun, May 15, 2022 at 7:34 AM Sigalit Eliazov
> wrote:
>
Hello
does beam have support for something similar to
KeyedBroadcastProcessFunction which exists in flink?
I am looking for an option to have broadcast state in beam so it can be
shared between different operators
Thanks
Sigalit
olding the last value seen per key in
> global window. There is an implementation of this approach in
> Deduplicate.KeyedValues [1].
>
> Jan
>
> [1]
>
> https://beam.apache.org/releases/javadoc/2.38.0/org/apache/beam/sdk/transforms/Deduplicate.KeyedValues.html
>
> On 4/2
Hi all
i have the following scenario:
a. a pipeline that reads messages from kafka and a session window with 1
minute duration.
b. groupbykey in order to aggregate the data
c. for each 'group' i do some calculation and build a new event to send to
kafka.
the output of this cycle is
key1 - value1
Hi all
I saw a very low rate when message consuming from kafka in our different
jobs.
I order to find the bottleneck i created
a very simple pipeline that reads string messages from kafka and just
prints
the output .
The pipeline runs over flink cluster with the following setup:
1 task manager, 3
Hi all,
We are currently using beam to create a few pipelines, and then deploy them
on our on-prem Flink cluster.
we have a few questions regarding the automation of the pipelines
deployment:
We have beam running as a k8s pod which starts a java process for each
pipeline and has the
28 matches
Mail list logo