Hello Everyone, I am trying to implement a custom source where split detection is an expensive operation and I would like to benefit from the split reader results to build my next splits.
Basically, my source is taking as input an id from my external system, let's name it ID1. >From ID1, I can get a list of other sub splits but getting this list is an expensive operation so I want it to be done on a task manager during the split reading of ID1. Now we can imagine sub splits of ID1 are ID1.1 and ID1.2. So, to sum up my split reader of ID1 will be responsible for: 1. Collecting content of ID1 2. Producing n sub splits Then, the split enumerator will receive these sub splits and schedule ID1.1, ... ID1.n for split reading. As of now, I have implemented this mechanism using events between split reader and split enumerator but I think there might be a better architecture using Flink. Finally, I have a second problem which is about avoiding extracting multiple times the same split. We can imagine, based on my previous explanation, that same ID might be detected through multiple parent splits. To avoid losing time doing the same job multiple times, I need to avoid extracting the same ID. Actually, I am thinking about storing the already extracted ID into the state and storing it into my state backend. What do you think about this ? Thank you. Benoit