Hi,
What is the number of events per second that you wish to process? If it’s high
enough (~ number of machines * number of cores) you should be just fine,
instead of scaling with number of features, scale with number of events. If you
have a single data source you still could randomly shuffle
Hi!
Could you please help me - I'm trying to use Apache Flink for machine
learning tasks with external ensemble/tree libs like XGBoost, so my
workflow will be like this:
- receive single stream of data which atomic event looks like a simple
vector event=(X1, X2, X3...Xn) and it can be imagi