Hello everyone,

I need to join some files to perform some processing.. The dataset API is a
perfect way to achieve this, I am able to do it when I read file in batch
(csv)

However in the prod environment, I will receive thoses files in kafka
messages (one message = one line of a file)
So I am considering using a global window + a custom trigger on a end of
file message and a process window function.
But I can not go too far with that as process is only one function and
chaining functions will be a pain. I don't think that emitting a datastream
& windows / trigger on EOF before every process function is a good idea

However I would like to work in a bounded way once I received all of my
elements (after the trigger on global window), like the dataset API, as I
will join on my whole dataset..

I thought maybe it would be a good idea to go for table API and group
window ? but you can not have custom trigger and a global group window on a
table ?(like the global window on datastream ?)
Best alternative would be to create a dataset as a result of my process
window function.. but I don't think this is possible, is it ?

Best Regards,
Bastien

Reply via email to