Depending on the kind of testing you're hoping to do you may want to
look into https://github.com/mguenther/kafka-junit. For example,
you're looking for some job level smoke tests that just answer the
question "Is everything wired up correctly?" Personally, I like how
this approach doesn't
m, and the checkpointing
> interval.
>
> On Thu, May 19, 2022 at 8:04 PM Aeden Jameson wrote:
>>
>> Thanks for the response David. That's the conclusion I came to as
>> well. The Hadoop plugin behavior doesn't appear to reflect more
>> recent changes to S3 like s
k
> needs to checkpointing, this works much better.
>
> Best,
> David
>
> On Thu, May 12, 2022 at 1:53 AM Aeden Jameson wrote:
>>
>> We're using S3 to store checkpoints. They are taken every minute. I'm
>> seeing a large number of 404 responses from S3
on't see any entropy injection happening. See the
>> comments on [2] for more on this.
>>
>> FWIW, I would urge you to use presto instead of hadoop for checkpointing on
>> S3. The performance of the hadoop "filesystem" is problematic when it's used
>> for ch
I have checkpoints setup against s3 using the hadoop plugin. (I'll
migrate to presto at some point) I've setup entropy injection per the
documentation with
state.checkpoints.dir: s3://my-bucket/_entropy_/my-job/checkpoints
s3.entropy.key: _entropy_
I'm seeing some behavior that I don't quite
We're using S3 to store checkpoints. They are taken every minute. I'm
seeing a large number of 404 responses from S3 being generated by the
job manager. The order of the entries in the debugging log would imply
that it's a result of a HEAD request to a key. For example all the
incidents look like
I've had success using Kafka for Junit,
https://github.com/mguenther/kafka-junit, for these kinds of tests.
On Wed, Apr 20, 2022 at 3:01 PM Alexey Trenikhun wrote:
>
> Hello,
> We have Flink job that read data from multiple Kafka topics, transforms data
> and write in output Kafka topics. We
I believe you can solve this iss with,
.setDeserializer(KafkaRecordDeserializationSchema.of(new
JSONKeyValueDeserializationSchema(false)))
On Thu, Mar 3, 2022 at 8:07 AM Kamil ty wrote:
>
> Hello,
>
> Sorry for the late reply. I have checked the issue and it seems to be a type
> issue as the
On Thu, Jan 20, 2022 at 2:46 AM yidan zhao wrote:
> self-define the window assigners.
>
Thanks, I'll check that out. If you have links to especially good examples
and explanations, that would be great. Otherwise, I presume the Flink
codebase itself is the place to start.
--
Cheers,
Aeden
type of window capable of emitting results for different
> parallelisms at different times is the session window [1]. Does that meet
> your needs?
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/#session-windows
>
> Aeden Jameson 于202
When using tumbling windows the windows materialize all at once which
results in burst-y traffic. How does one go about unaligned tumbling
windows? Does this require going down the road of custom window, assigner
and triggers?
--
Cheers,
Aeden
Builder will change when you call that method, and thus
> the default trigger will be overwritten by calling WindowedStream#trigger.
>
> Aeden Jameson 于2021年9月1日周三 上午12:32写道:
>
>> Flink Version: 1.13.2
>>
>> In the section on Default Triggers of Window Assigners th
Flink Version: 1.13.2
In the section on Default Triggers of Window Assigners the documentation
States
By specifying a trigger using trigger() you are overwriting the default
trigger of a WindowAssigner. For example, if you specify a CountTrigger for
TumblingEventTimeWindows you will no longer
My use case is that I'm producing a set of measurements by key every
60 seconds. Currently, this is handled with the usual pattern of
keyBy().window(Tumbling...(60)).process(...) I need to provide the same
output, but as a result of a timeout. The data needed for the timeout
summary will be in
state and not new watermark
> generated from there and then it stunk all the downsteams and stop
> consuming data from Kafka? I didn’t use watermark in my application through.
>
>
>
> I checked that all the Kafka partition has data consistently coming in.
>
>
>
> Bes
This can happen if you have an idle partition. Are all partitions
receiving data consistently?
On Tue, Jul 13, 2021 at 2:59 PM Jerome Li wrote:
>
> Hi,
>
>
>
> I got question about Flink Kafka consumer. I am facing the issue that the
> Kafka consumer somehow stop consuming data from Kafka after
ow it is not fixed yet.
> thank you again and do you have any solutions ?
>
> On 2021/07/07 01:47:00, Aeden Jameson wrote:
> > Hi Li,
> >
> >How big is your keyspace? Had a similar problem which turns out to
> > be scenario 2 in this issue
> > https://issu
Hi Li,
How big is your keyspace? Had a similar problem which turns out to
be scenario 2 in this issue
https://issues.apache.org/jira/browse/FLINK-19970. Looks like the bug
in scenario 1 got fixed by scenario 2 did not. There's more detail in
this thread,
One way to deal with this is by using the source idleness setting.
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#dealing-with-idle-sources
On Sat, Jun 19, 2021 at 12:06 PM Dan Hill wrote:
> Hi.
>
> I remember listening to Flink
I've probably overlooked something simple, but when converting a
datastream to a table how does one convert a long to timestamp(3) that
will not be your event or proc time.
I've tried
tEnv.createTemporaryView(
"myTable"
,myDatastream
,
It's my understanding that a group by is also a key by under the hood.
As a result that will cause a shuffle operation to happen. Our source
is a Kafka topic that is keyed so that any give partition contains all
the data that is needed for any given consuming TM. Is there a way
using FlinkSQL to
I'm trying to get my head around the impact of setting max parallelism.
* Does max parallelism primarily serve as a reservation for future
increases to parallelism? The reservation being the ability to restore
from checkpoints and savepoints after increases to parallelism.
* Does it serve as a
tasks have the same parallelism 36, your job should only allocate
> 36 slots. The evenly-spread-out-slots option should help in your case.
>
> Is it possible for you to share the complete jobmanager logs?
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Mar 16
3:54 AM Chesnay Schepler wrote:
>>
>> Is this a brand-new job, with the cluster having all 18 TMs at the time
>> of submission? (or did you add more TMs while the job was running)
>>
>> On 3/12/2021 5:47 PM, Aeden Jameson wrote:
>> > Hi Matthias,
>>
apache/flink/runtime/resourcemanager/slotmanager/SlotManagerConfiguration.java#L141
>
> On Fri, Mar 12, 2021 at 12:58 AM Aeden Jameson
> wrote:
>>
>> Hi Arvid,
>>
>> Thanks for responding. I did check the configuration tab of the job
>> manager and the setting cluster.evenly-spread-
ers 2 slots. In that way, the
> scheduler can only evenly distribute.
>
> On Wed, Mar 10, 2021 at 7:21 PM Aeden Jameson wrote:
>>
>> I have a cluster with 18 task managers 4 task slots each running a
>> job whose source/sink(s) are declared with FlinkSQL using the Kafka
&
I have a cluster with 18 task managers 4 task slots each running a
job whose source/sink(s) are declared with FlinkSQL using the Kafka
connector. The topic being read has 36 partitions. The problem I'm
observing is that the subtasks for the sources are not evenly
distributed. For example, 1
Correction: The first link was supposed to be,
1. pipeline.auto-watermark-interval
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#pipeline-auto-watermark-interval
On Wed, Mar 3, 2021 at 7:46 PM Aeden Jameson wrote:
>
> I'm hoping to have my con
I'm hoping to have my confusion clarified regarding the settings,
1. pipeline.auto-watermark-interval
https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long-
2. setAutoWatermarkInterval
Do all operators have the same parallelism?
>
> Regards,
> Timo
>
>
> On 25.02.21 00:49, Aeden Jameson wrote:
> > I have a job made up of a few FlinkSQL statements using a
> > statement set. In my job graph viewed through the Flink UI a few of
> > the task
I have a job made up of a few FlinkSQL statements using a
statement set. In my job graph viewed through the Flink UI a few of
the tasks/statements are preceded by this task
rowtime field: (#11: event_time TIME ATTRIBUTE(ROWTIME))
that has an upstream Kafka source/sink task.
In my job graph viewed through the Flink UI I see a task named,
rowtime field: (#11: event_time TIME ATTRIBUTE(ROWTIME))
that has an upstream Kafka source task. What exactly does the rowtime task do?
--
Thank you,
Aeden
Hi
How does one specify computed columns when converting a DataStream
to a temporary view? For example
final DataStream stream = env.addSource(..);
tEnv.createTemporaryView(
"myTable"
stream
,$("col1")
,$("col2")
This may help you out.
https://github.com/apache/flink/blob/master/flink-libraries/flink-cep/src/test/java/org/apache/flink/cep/CEPITCase.java
On Sun, Jan 17, 2021 at 10:32 AM narasimha wrote:
>
> Hi,
>
> I'm using Flink CEP, but couldn't find any examples for writing test cases
> for the
When using statement sets, if two select queries use the same table
(e.g. Kafka Topic), does each query get its own copy of data?
Thank you,
Aeden
n the new design. The Avro
> schema is derived entirely from the table's schema.
>
> Regards,
> Timo
>
>
>
> On 07.01.21 09:41, Aeden Jameson wrote:
> > Hi Timo,
> >
> > Thanks for responding. You're right. So I did update the properties.
> >>
g, they are mentioned here [2].
>
> Regards,
> Timo
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory
> [2] https://ci.apache.org/projects/flink/flink-docs-release-1.12/
>
>
> On 06.01.21 18:35, Aeden J
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/kafka.html#dependencies
>
> Best,
> Piotrek
>
> śr., 6 sty 2021 o 04:37 Aeden Jameson napisał(a):
>>
>> I've upgraded from 1.11.1 to 1.12 in hopes of using the key.fields
>> feature of the
I've upgraded from 1.11.1 to 1.12 in hopes of using the key.fields
feature of the Kafa SQL Connector. My current connector is configured
as ,
connector.type= 'kafka'
connector.version = 'universal'
connector.topic = 'my-topic'
connector.properties.group.id = 'my-consumer-group'
Based on these docs,
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html,
the default partitioning behavior is not quite clear to me.
If no value for sink-partitioner is given, is the default behavior
just that of the native Kafka library? (with key use
My understanding is the FlinkKafkaConsumer is a wrapper around the
Kafka consumer libraries so if you've set the group.id property you
should be able to see the offsets with something like
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe
--group my-flink-application.
On Tue,
41 matches
Mail list logo