Re: Source vs SourceFunction and testing

2022-05-24 Thread Aeden Jameson
Depending on the kind of testing you're hoping to do you may want to look into https://github.com/mguenther/kafka-junit. For example, you're looking for some job level smoke tests that just answer the question "Is everything wired up correctly?" Personally, I like how this approach doesn't

Re: Checkpointing - Job Manager S3 Head Requests are 404s

2022-05-19 Thread Aeden Jameson
m, and the checkpointing > interval. > > On Thu, May 19, 2022 at 8:04 PM Aeden Jameson wrote: >> >> Thanks for the response David. That's the conclusion I came to as >> well. The Hadoop plugin behavior doesn't appear to reflect more >> recent changes to S3 like s

Re: Checkpointing - Job Manager S3 Head Requests are 404s

2022-05-19 Thread Aeden Jameson
k > needs to checkpointing, this works much better. > > Best, > David > > On Thu, May 12, 2022 at 1:53 AM Aeden Jameson wrote: >> >> We're using S3 to store checkpoints. They are taken every minute. I'm >> seeing a large number of 404 responses from S3

Re: Confusing S3 Entropy Injection Behavior

2022-05-19 Thread Aeden Jameson
on't see any entropy injection happening. See the >> comments on [2] for more on this. >> >> FWIW, I would urge you to use presto instead of hadoop for checkpointing on >> S3. The performance of the hadoop "filesystem" is problematic when it's used >> for ch

Confusing S3 Entropy Injection Behavior

2022-05-18 Thread Aeden Jameson
I have checkpoints setup against s3 using the hadoop plugin. (I'll migrate to presto at some point) I've setup entropy injection per the documentation with state.checkpoints.dir: s3://my-bucket/_entropy_/my-job/checkpoints s3.entropy.key: _entropy_ I'm seeing some behavior that I don't quite

Checkpointing - Job Manager S3 Head Requests are 404s

2022-05-11 Thread Aeden Jameson
We're using S3 to store checkpoints. They are taken every minute. I'm seeing a large number of 404 responses from S3 being generated by the job manager. The order of the entries in the debugging log would imply that it's a result of a HEAD request to a key. For example all the incidents look like

Re: Integration Test for Kafka Streaming job

2022-04-20 Thread Aeden Jameson
I've had success using Kafka for Junit, https://github.com/mguenther/kafka-junit, for these kinds of tests. On Wed, Apr 20, 2022 at 3:01 PM Alexey Trenikhun wrote: > > Hello, > We have Flink job that read data from multiple Kafka topics, transforms data > and write in output Kafka topics. We

Re: Example with JSONKeyValueDeserializationSchema?

2022-03-03 Thread Aeden Jameson
I believe you can solve this iss with, .setDeserializer(KafkaRecordDeserializationSchema.of(new JSONKeyValueDeserializationSchema(false))) On Thu, Mar 3, 2022 at 8:07 AM Kamil ty wrote: > > Hello, > > Sorry for the late reply. I have checked the issue and it seems to be a type > issue as the

Re: Unaligned Tumbling Windows

2022-01-20 Thread Aeden Jameson
On Thu, Jan 20, 2022 at 2:46 AM yidan zhao wrote: > self-define the window assigners. > Thanks, I'll check that out. If you have links to especially good examples and explanations, that would be great. Otherwise, I presume the Flink codebase itself is the place to start. -- Cheers, Aeden

Re: Unaligned Tumbling Windows

2022-01-20 Thread Aeden Jameson
type of window capable of emitting results for different > parallelisms at different times is the session window [1]. Does that meet > your needs? > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/#session-windows > > Aeden Jameson 于202

Unaligned Tumbling Windows

2022-01-14 Thread Aeden Jameson
When using tumbling windows the windows materialize all at once which results in burst-y traffic. How does one go about unaligned tumbling windows? Does this require going down the road of custom window, assigner and triggers? -- Cheers, Aeden

Re: Clarifying Documentation on Custom Triggers

2021-09-01 Thread Aeden Jameson
Builder will change when you call that method, and thus > the default trigger will be overwritten by calling WindowedStream#trigger. > > Aeden Jameson 于2021年9月1日周三 上午12:32写道: > >> Flink Version: 1.13.2 >> >> In the section on Default Triggers of Window Assigners th

Clarifying Documentation on Custom Triggers

2021-08-31 Thread Aeden Jameson
Flink Version: 1.13.2 In the section on Default Triggers of Window Assigners the documentation States By specifying a trigger using trigger() you are overwriting the default trigger of a WindowAssigner. For example, if you specify a CountTrigger for TumblingEventTimeWindows you will no longer

Timer Service vs Custom Triggers

2021-08-18 Thread Aeden Jameson
My use case is that I'm producing a set of measurements by key every 60 seconds. Currently, this is handled with the usual pattern of keyBy().window(Tumbling...(60)).process(...) I need to provide the same output, but as a result of a timeout. The data needed for the timeout summary will be in

Re: Kafka Consumer stop consuming data

2021-07-14 Thread Aeden Jameson
state and not new watermark > generated from there and then it stunk all the downsteams and stop > consuming data from Kafka? I didn’t use watermark in my application through. > > > > I checked that all the Kafka partition has data consistently coming in. > > > > Bes

Re: Kafka Consumer stop consuming data

2021-07-13 Thread Aeden Jameson
This can happen if you have an idle partition. Are all partitions receiving data consistently? On Tue, Jul 13, 2021 at 2:59 PM Jerome Li wrote: > > Hi, > > > > I got question about Flink Kafka consumer. I am facing the issue that the > Kafka consumer somehow stop consuming data from Kafka after

Re: Flink CEP checkpoint size

2021-07-07 Thread Aeden Jameson
ow it is not fixed yet. > thank you again and do you have any solutions ? > > On 2021/07/07 01:47:00, Aeden Jameson wrote: > > Hi Li, > > > >How big is your keyspace? Had a similar problem which turns out to > > be scenario 2 in this issue > > https://issu

Re: Flink CEP checkpoint size

2021-07-06 Thread Aeden Jameson
Hi Li, How big is your keyspace? Had a similar problem which turns out to be scenario 2 in this issue https://issues.apache.org/jira/browse/FLINK-19970. Looks like the bug in scenario 1 got fixed by scenario 2 did not. There's more detail in this thread,

Re: Advancing watermark with low-traffic Kafka partitions

2021-06-19 Thread Aeden Jameson
One way to deal with this is by using the source idleness setting. https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#dealing-with-idle-sources On Sat, Jun 19, 2021 at 12:06 PM Dan Hill wrote: > Hi. > > I remember listening to Flink

Long to Timestamp(3) Conversion

2021-04-21 Thread Aeden Jameson
I've probably overlooked something simple, but when converting a datastream to a table how does one convert a long to timestamp(3) that will not be your event or proc time. I've tried tEnv.createTemporaryView( "myTable" ,myDatastream ,

Eliminating Shuffling Under FlinkSQL

2021-03-18 Thread Aeden Jameson
It's my understanding that a group by is also a key by under the hood. As a result that will cause a shuffle operation to happen. Our source is a Kafka topic that is keyed so that any give partition contains all the data that is needed for any given consuming TM. Is there a way using FlinkSQL to

Understanding Max Parallelism

2021-03-18 Thread Aeden Jameson
I'm trying to get my head around the impact of setting max parallelism. * Does max parallelism primarily serve as a reservation for future increases to parallelism? The reservation being the ability to restore from checkpoints and savepoints after increases to parallelism. * Does it serve as a

Re: Evenly Spreading Out Source Tasks

2021-03-17 Thread Aeden Jameson
tasks have the same parallelism 36, your job should only allocate > 36 slots. The evenly-spread-out-slots option should help in your case. > > Is it possible for you to share the complete jobmanager logs? > > > Thank you~ > > Xintong Song > > > > On Tue, Mar 16

Re: Evenly Spreading Out Source Tasks

2021-03-15 Thread Aeden Jameson
3:54 AM Chesnay Schepler wrote: >> >> Is this a brand-new job, with the cluster having all 18 TMs at the time >> of submission? (or did you add more TMs while the job was running) >> >> On 3/12/2021 5:47 PM, Aeden Jameson wrote: >> > Hi Matthias, >>

Re: Evenly Spreading Out Source Tasks

2021-03-12 Thread Aeden Jameson
apache/flink/runtime/resourcemanager/slotmanager/SlotManagerConfiguration.java#L141 > > On Fri, Mar 12, 2021 at 12:58 AM Aeden Jameson > wrote: >> >> Hi Arvid, >> >> Thanks for responding. I did check the configuration tab of the job >> manager and the setting cluster.evenly-spread-

Re: Evenly Spreading Out Source Tasks

2021-03-11 Thread Aeden Jameson
ers 2 slots. In that way, the > scheduler can only evenly distribute. > > On Wed, Mar 10, 2021 at 7:21 PM Aeden Jameson wrote: >> >> I have a cluster with 18 task managers 4 task slots each running a >> job whose source/sink(s) are declared with FlinkSQL using the Kafka &

Evenly Spreading Out Source Tasks

2021-03-10 Thread Aeden Jameson
I have a cluster with 18 task managers 4 task slots each running a job whose source/sink(s) are declared with FlinkSQL using the Kafka connector. The topic being read has 36 partitions. The problem I'm observing is that the subtasks for the sources are not evenly distributed. For example, 1

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-04 Thread Aeden Jameson
Correction: The first link was supposed to be, 1. pipeline.auto-watermark-interval https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#pipeline-auto-watermark-interval On Wed, Mar 3, 2021 at 7:46 PM Aeden Jameson wrote: > > I'm hoping to have my con

pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-03 Thread Aeden Jameson
I'm hoping to have my confusion clarified regarding the settings, 1. pipeline.auto-watermark-interval https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/common/ExecutionConfig.html#setAutoWatermarkInterval-long- 2. setAutoWatermarkInterval

Re: BackPressure in RowTime Task of FlinkSql Job

2021-02-26 Thread Aeden Jameson
Do all operators have the same parallelism? > > Regards, > Timo > > > On 25.02.21 00:49, Aeden Jameson wrote: > > I have a job made up of a few FlinkSQL statements using a > > statement set. In my job graph viewed through the Flink UI a few of > > the task

BackPressure in RowTime Task of FlinkSql Job

2021-02-24 Thread Aeden Jameson
I have a job made up of a few FlinkSQL statements using a statement set. In my job graph viewed through the Flink UI a few of the tasks/statements are preceded by this task rowtime field: (#11: event_time TIME ATTRIBUTE(ROWTIME)) that has an upstream Kafka source/sink task.

Role of Rowtime Field Task?

2021-02-20 Thread Aeden Jameson
In my job graph viewed through the Flink UI I see a task named, rowtime field: (#11: event_time TIME ATTRIBUTE(ROWTIME)) that has an upstream Kafka source task. What exactly does the rowtime task do? -- Thank you, Aeden

Computed Columns In Stream to Table Conversion

2021-01-18 Thread Aeden Jameson
Hi How does one specify computed columns when converting a DataStream to a temporary view? For example final DataStream stream = env.addSource(..); tEnv.createTemporaryView( "myTable" stream ,$("col1") ,$("col2")

Re: CEP Test cases example

2021-01-17 Thread Aeden Jameson
This may help you out. https://github.com/apache/flink/blob/master/flink-libraries/flink-cep/src/test/java/org/apache/flink/cep/CEPITCase.java On Sun, Jan 17, 2021 at 10:32 AM narasimha wrote: > > Hi, > > I'm using Flink CEP, but couldn't find any examples for writing test cases > for the

Statement Sets

2021-01-11 Thread Aeden Jameson
When using statement sets, if two select queries use the same table (e.g. Kafka Topic), does each query get its own copy of data? Thank you, Aeden

Re: Using key.fields in 1.12

2021-01-07 Thread Aeden Jameson
n the new design. The Avro > schema is derived entirely from the table's schema. > > Regards, > Timo > > > > On 07.01.21 09:41, Aeden Jameson wrote: > > Hi Timo, > > > > Thanks for responding. You're right. So I did update the properties. > >>

Re: Using key.fields in 1.12

2021-01-07 Thread Aeden Jameson
g, they are mentioned here [2]. > > Regards, > Timo > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory > [2] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ > > > On 06.01.21 18:35, Aeden J

Re: Using key.fields in 1.12

2021-01-06 Thread Aeden Jameson
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/kafka.html#dependencies > > Best, > Piotrek > > śr., 6 sty 2021 o 04:37 Aeden Jameson napisał(a): >> >> I've upgraded from 1.11.1 to 1.12 in hopes of using the key.fields >> feature of the

Using key.fields in 1.12

2021-01-05 Thread Aeden Jameson
I've upgraded from 1.11.1 to 1.12 in hopes of using the key.fields feature of the Kafa SQL Connector. My current connector is configured as , connector.type= 'kafka' connector.version = 'universal' connector.topic = 'my-topic' connector.properties.group.id = 'my-consumer-group'

Kafka SQL Connector Behavior (v1.11)

2021-01-04 Thread Aeden Jameson
Based on these docs, https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/kafka.html, the default partitioning behavior is not quite clear to me. If no value for sink-partitioner is given, is the default behavior just that of the native Kafka library? (with key use

Re: Get current kafka offsets for kafka sources

2020-12-15 Thread Aeden Jameson
My understanding is the FlinkKafkaConsumer is a wrapper around the Kafka consumer libraries so if you've set the group.id property you should be able to see the offsets with something like kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-flink-application. On Tue,