Hi, I was wondering if anyone has encountered or used Beam in the following
manner: 1. During machine learning training, use Beam to create the event
table. The flow may consist of some joins, aggregations, row-based
transformations, etc... 2. Once the model is created, deploy the model to some
scoring service via PMML (or some other scoring service). 3. Enable the SAME
transformations used in #1 by using a separate engine but thereby guaranteeing
that it will transform the data identically as the engine used in #1.
I think this is a pretty interesting use case where Beam is used to guarantee
portability across engines and deployment (batch to true streaming, not
micro-batch). What's not clear to me is with respect to how batch joins would
translate during one-by-one scoring (probably lookups) or how aggregations
given that some kind of history would need to be stored (and how much is kept
is configurable too).
Thoughts?
Thanks,Ron