Re: Optimize exact deduplication for tens of billions data per day

2024-04-15 Thread Alex Cruise
It may not be completely relevant to this conversation in this year, but I find myself sharing this article once or twice a year when opining about how hard deduplication at scale can be. 😅 -0xe1a On Thu, Apr 11, 2024 at 10:22 PM Péter Váry wrote

Fwd: Global connection open and close

2024-03-22 Thread Alex Cruise
[previous didn't cc list, sorry for dupes] The classic connection pool pattern, where expensive connections are created relatively few times and used by lots of transient short-lived tasks, each of which borrows a connection from the pool and returns it when done, would still be usable here, but a

Re: Global connection open and close

2024-03-22 Thread Alex Cruise
The classic connection pool pattern, where expensive connections are created relatively few times and used by lots of transient short-lived tasks, each of which borrows a connection from the pool and returns it when done, would still be usable here, but as Péter points out, you can't rely on a sing

Re: Event stuck in the Flink operator

2023-12-14 Thread Alex Cruise
Can you share your precise join semantics? I don't know about Flink SQL offhand, but here are a couple ways to do this when you're using the DataStream API: * use the Session Window join

Re: Request-Response flow for real-time analytics

2023-08-23 Thread Alex Cruise
This is a pretty hard problem. I would be inclined to try Queryable State ( https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/queryable_state/) first. -0xe1a On Mon, Aug 21, 2023 at 11:04 PM Jiten Pathy wrote: > Hi, > We are currently evaluating Flink for

Fast serialization for Kotlin data classes

2021-09-14 Thread Alex Cruise
Hi there, I appreciate the fact that Flink has built-in support for making POJO and Scala `case class` serialization faster, but in my project we use immutable Kotlin `data class`es (analogous to Scala `case class`es) extensively, and we'd really prefer not to make them POJOs, mostly for style/tas

Re: Async + Broadcast?

2021-04-08 Thread Alex Cruise
; > [1] https://issues.apache.org/jira/browse/FLINK-16219 > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/execution_configuration.html > > On Thu, Apr 8, 2021 at 3:26 AM Alex Cruise wrote: > >> Thanks Austin! I'll proceed with my idea, and keep t

Re: Async + Broadcast?

2021-04-07 Thread Alex Cruise
ts when the job starts, since the broadcast > from the polling source can happen later than recieving the first record – > we solved this by calling the polling source's service in the `open()` > method of the non-async operator and storing the initial configs in memory. > >

Async + Broadcast?

2021-04-07 Thread Alex Cruise
Hi folks, I have a somewhat complex Flink job that has a few async stages, and a few stateful stages. It currently loads its configuration on startup, and doesn't attempt to refresh it. Now I'm working on dynamic reconfiguration. I've written a polling source which sends a configuration snapshot

Re: Multiple side outputs of same type?

2020-12-18 Thread Alex Cruise
puts of the > same type. > > [1] > https://github.com/apache/flink/blob/1a08548a209167cafeeba1ce602fe8d542994be5/flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/operators/StreamOperatorChainingTest.java#L179 > > On Fri, Dec 18, 2020 at 7:52 PM Alex Cruise w

Multiple side outputs of same type?

2020-12-18 Thread Alex Cruise
Hey folks, I have a program that demultiplexes input records from a shared prefix stream onto some number of suffix streams, which are allocated on boot based on configuration. At the moment I'm just duplicating the input records, and filtering out the wrong records in each suffix stream, but it'

Re: "stepless" sliding windows?

2020-11-17 Thread Alex Cruise
AM Danny Chan wrote: > >> The SLIDING window always triggers as of each step, what do you mean by >> "stepless" ? >> >> Alex Cruise 于2020年10月21日周三 上午1:52写道: >> >>> whoops.. as usual, posting led me to find some answers myself. Does this >>&g

Re: "stepless" sliding windows?

2020-10-20 Thread Alex Cruise
} override fun getWindowSerializer(executionConfig: ExecutionConfig?): TypeSerializer { return TimeWindow.Serializer() } override fun isEventTime(): Boolean { return true } } On Tue, Oct 20, 2020 at 9:13 AM Alex Cruise wrote: > Hey folks! > >

"stepless" sliding windows?

2020-10-20 Thread Alex Cruise
Hey folks! I have an application that wants to use "stepless" sliding windows, i.e. we produce aggregates on every event. The windows need to be of a fixed size, but to have their start and end times update continuously, and I'd like to trigger on every event. Is this a bad idea? I've googled and