Hello,

I recently started looking into serving Machine Learning models for streaming 
data in Flink. To give more context, that would involve training a model 
offline (using PyTorch or TensorFlow), and calling it from inside a Flink job 
to do online inference on newly arrived data. I have found multiple 
discussions, presentations, and tools that could achieve this, and it seems 
like the two alternatives would be: (1) wrap the pre-trained models in a HTTP 
service (such as PyTorch Serve [1]) and let Flink do async calls for model 
scoring, or (2) convert the models into a standardized format (e.g., ONNX [2]), 
pre-load the model in memory for every task manager (or use external storage if 
needed) and call it for each new data point.

Both approaches come with a set of advantages and drawbacks and, as far as I 
understand, there is no "silver bullet", since one approach could be more 
suitable than the other based on the application requirements. However, I would 
be curious to know what would be the "recommended" methods for model serving 
(if any) and what approaches are currently adopted by the users in the wild.


[1] https://pytorch.org/serve/

[2] https://onnx.ai/


Best regards,

Sonia


 [Kth Logo]

Sonia-Florina Horchidan
PhD Student
KTH Royal Institute of Technology
Software and Computer Systems (SCS)
School of Electrical Engineering and Computer Science (EECS)
Mobil: +46769751562
<mailto:[email protected]>[email protected]<mailto:[email protected]>, <http://www.kth.se/> 
www.kth.se<http://www.kth.se>

Reply via email to