Sorting in a WindowedDataStream

2015-04-14 Thread Niklas Semmler
Hello there, What functions should be used to aggregate (unordered) tuples for every window in a WindowedDataStream to a (ordered) list? Neither foldWindow nor reduceWindow seems to be applicable, and aggregate does not, to my understanding, take user-defined functions. To get started with

A custom FileInputFormat

2016-10-28 Thread Niklas Semmler
/2016 18:27:56 DataSink (TextOutputFormat (/tmp/results) - UTF-8)(1/1) switched to RUNNING 10/28/2016 18:27:56 DataSink (TextOutputFormat (/tmp/results) - UTF-8)(1/1) switched to FINISHED 10/28/2016 18:27:56 Job execution switched to status FINISHED. -- Niklas Semmler PhD Student / Resea

Re: A custom FileInputFormat

2016-11-16 Thread Niklas Semmler
s meant to *read* files, not just look up file names. I'd rather implement an InputFormat from scratch. Since you are only running a single instance, you can return a single dummy InputSplit. Let me know, if you have further questions. Best, Fabian 2016-10-28 18:38 GMT+02:00 Niklas Semmler

Re: Synchronization across tasks using checkpoint barriers

2022-02-14 Thread Niklas Semmler
Hi Gopi, You can implement CheckpointedFunction and use the method snapshotState(FunctionSnapshotContext) to upload state on each checkpoint. https://nightlies.apache.org/flink/flink-docs-release-1.14/api/java/org/apache/flink/streaming/api/checkpoint/CheckpointedFunction.html Make sure, you d

Re: Need help on implementing custom`snapshotState` in KafkaSource & KafkaSourceReader

2022-02-14 Thread Niklas Semmler
Hi Santosh, It’s best to avoid cross-posting. Let’s keep the discussion to SO. Best regards, Niklas > On 12. Feb 2022, at 16:39, santosh joshi wrote: > > We are migrating to KafkaSource from FlinkKafkaConsumer. We have disabled > auto commit of offset and instead committing them manually to s

Re: Synchronization across tasks using checkpoint barriers

2022-02-14 Thread Niklas Semmler
r. > > On Mon, Feb 14, 2022 at 8:11 PM Niklas Semmler wrote: > Hi Gopi, > > You can implement CheckpointedFunction and use the method > snapshotState(FunctionSnapshotContext) to upload state on each checkpoint. > > https://nightlies.apache.org/flink/flink-docs-

Re: unpredictable behaviour on KafkaSource deserialisation error

2022-02-14 Thread Niklas Semmler
Hi Frank, This sounds like an interesting issue. Can you share a minimal working example? Best regards, Niklas > On 9. Feb 2022, at 23:11, Frank Dekervel wrote: > > Hello, > > When trying to reproduce a bug, we made a DeserialisationSchema that throws > an exception when a malformed message

Re: Flink 1.12.x DataSet --> Flink 1.14.x DataStream

2022-02-14 Thread Niklas Semmler
Hi Saravanan, AFAIK the last record is not treated differently. Does the approach in [1] not work? Best regards, Niklas https://github.com/dmvk/flink/blob/2f1b573cd57e95ecac13c8c57c0356fb281fd753/flink-runtime/src/test/java/org/apache/flink/runtime/operators/chaining/ChainTaskTest.java#L279

Re: How to access Task.isBackPressured() from a SourceFunction?

2022-02-14 Thread Niklas Semmler
Hi Darren, No, you cannot access the Task from the operator. You can access some metrics via the RuntimeContext. getRuntimeContext().getMetricGroup() How does the backpressure help you here? Backpressure can originate in any operator or network connection. If it's an operator further downstre