Re: Non-temporal watermarks

2023-02-02 Thread James Sandys-Lumsdaine
I can describe a use that has been successful for me. We have a Flink workflow that calculates reports over many days and have it currently set up to recompute the last 10 days or so when recovering this "deep history" from our databases and then switches over to live flow to process all subsequ

Setting a timer within broadcast applyToKeyedState() (feature request)

2022-07-07 Thread James Sandys-Lumsdaine
Hello, I know we can’t set a timer in the processBroadcastElement() of the KeyedBroadcastProcessFunction as there is no key. However, there is a context.applyToKeyedState() method which allows us to iterate over the keyed state in the scope of a key. So it is possible to add access to the Tim

Re: Checkpoint directories not cleared as TaskManagers run

2022-05-18 Thread James Sandys-Lumsdaine
in order to work - as your jobmanager can not access the checkpoint files of it can also not clean-up those files Hope that helps Regards Thias From: James Sandys-Lumsdaine Sent: Tuesday, May 17, 2022 3:55 PM To: Hangxiang Yu ; user@flink.apache.org Subject: Re: Checkpoint directories not

Re: Checkpoint directories not cleared as TaskManagers run

2022-05-17 Thread James Sandys-Lumsdaine
___ From: Hangxiang Yu Sent: 17 May 2022 14:38 To: James Sandys-Lumsdaine ; user@flink.apache.org Subject: Re: Checkpoint directories not cleared as TaskManagers run Hi, James. I may not get what the problem is. All checkpoints will store in the address as you set. IIUC, TMs will write som

Re: Checkpoint directories not cleared as TaskManagers run

2022-05-17 Thread James Sandys-Lumsdaine
s going on a bit better it seems pointless for me to have file checkpoints that can't be read by the jobmanager for failover. If anyone can clarify/correct me I would appreciate. James. ____ From: James Sandys-Lumsdaine Sent: 16 May 2022 18:52 To: user@flink.apac

Checkpoint directories not cleared as TaskManagers run

2022-05-16 Thread James Sandys-Lumsdaine
Hello, I'm seeing my Flink deployment's checkpoint storage directories build up and never clear down. When I run from my own IDE, I see the only the latest "chk-x" directory under the job id folder. So the first checkpoint is "chk-1", which is then replaced with "chk-2" etc. However, when

Re: Slowness using GenericWriteAheadSink

2022-03-23 Thread James Sandys-Lumsdaine
Is anyone able to comment on the below? My worry is this class isn’t well support so I may need to find an alternative to bulk copy data into SQL Server e.g. use a simple file sink and then have some process bulk copy the files. From: Sandys-Lumsdaine, James Sent

Slowness using GenericWriteAheadSink

2022-03-14 Thread James Sandys-Lumsdaine
Hello, We are using the GenericWriteAheadSink to buffer up values to then send to a SQL Server database with a fast bulk copy upload. However, when I watch my process running it seems to be a huge amount of time iterating the Iterable provided to the sendValues() method. It takes such a long ti

Basic questions about resuming stateful Flink jobs

2022-02-16 Thread James Sandys-Lumsdaine
Hi all, I have a 1.14 Flink streaming workflow with many stateful functions that has a FsStateBackend and checkpointed enabled, although I haven't set a location for the checkpointed state. I've really struggled to understand how I can stop my Flink job and restart it and ensure it carries off

Re: Unit test harness for Sources

2022-02-15 Thread James Sandys-Lumsdaine
would suggest to follow the example at https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/testing/#testing-flink-jobs and using a job that only contains the source (+ something to either extract the results or verify it within the job). On 14/02/2022 18:06, James S

Unit test harness for Sources

2022-02-14 Thread James Sandys-Lumsdaine
Hi all, I've been using the test harness classes to unit test my stateful 1 and 2 stream functions. But I also have some stateful legacy Source classes I would like to unit test and can't find any documentation or example for that - is this possible? Thanks, James.

Re: GenericWriteAheadSink, declined checkpoint for a finished source

2021-12-06 Thread James Sandys-Lumsdaine
is feature is indeed available and working in 1.14.0 and how to correctly enable? Thanks, James. From: Austin Cawley-Edwards Sent: 04 November 2021 18:46 To: James Sandys-Lumsdaine Cc: user@flink.apache.org Subject: Re: GenericWriteAheadSink, declined checkp

GenericWriteAheadSink, declined checkpoint for a finished source

2021-11-03 Thread James Sandys-Lumsdaine
Hello, I have a Flink workflow where I need to upload the output data into a legacy SQL Server database and so I have read the section in the Flink book about data sinks and utilizing the GenericWriteAheadSink base class. I am currently using Flink 1.12.3 although we plan to upgrade to 1.14 sho

Re: Empty Kafka topics and watermarks

2021-10-11 Thread James Sandys-Lumsdaine
Ah thanks for the feedback. I can work around for now but will upgrade as soon as I can to the latest version. Thanks very much, James. From: Piotr Nowojski Sent: 08 October 2021 13:17 To: James Sandys-Lumsdaine Cc: user@flink.apache.org Subject: Re: Empty

Empty Kafka topics and watermarks

2021-10-08 Thread James Sandys-Lumsdaine
Hi everyone, I'm putting together a Flink workflow that needs to merge historic data from a custom JDBC source with a Kafka flow (for the realtime data). I have successfully written the custom JDBC source that emits a watermark for the last event time after all the DB data has been emitted but

Broadcast data to all keyed streams

2021-09-06 Thread James Sandys-Lumsdaine
Hello, I have a Flink workflow which is partitioned on a key common to all the stream objects and a key that is best suited to the high volume of data I am processing. I now want to add in a new stream of prices that I want to make available to all partitioned streams - however, this new stream

Questions on reading JDBC data with Flink Streaming API

2021-08-10 Thread James Sandys-Lumsdaine
Hello, I'm starting a new Flink application to allow my company to perform lots of reporting. We have an existing legacy system with most the data we need held in SQL Server databases. We will need to consume data from these databases initially before starting to consume more data from newly