Hello,
I recently started looking into serving Machine Learning models for streaming data in Flink. To give more context, that would involve training a model offline (using PyTorch or TensorFlow), and calling it from inside a Flink job to do online inference on newly arrived data. I have found multiple discussions, presentations, and tools that could achieve this, and it seems like the two alternatives would be: (1) wrap the pre-trained models in a HTTP service (such as PyTorch Serve [1]) and let Flink do async calls for model scoring, or (2) convert the models into a standardized format (e.g., ONNX [2]), pre-load the model in memory for every task manager (or use external storage if needed) and call it for each new data point. Both approaches come with a set of advantages and drawbacks and, as far as I understand, there is no "silver bullet", since one approach could be more suitable than the other based on the application requirements. However, I would be curious to know what would be the "recommended" methods for model serving (if any) and what approaches are currently adopted by the users in the wild. [1] https://pytorch.org/serve/ [2] https://onnx.ai/ Best regards, Sonia [Kth Logo] Sonia-Florina Horchidan PhD Student KTH Royal Institute of Technology Software and Computer Systems (SCS) School of Electrical Engineering and Computer Science (EECS) Mobil: +46769751562 <mailto:[email protected]>[email protected]<mailto:[email protected]>, <http://www.kth.se/> www.kth.se<http://www.kth.se>
