> I think the simplest solution would be to have some kind of override/hook
> that allows Flink/Spark/... to provide storage. They already have a concept
> of a job and know how to store them so can we piggyback the Beam pipeline
> there.
>

That makes sense to me, since it avoids adding a dependency on a database
like Mongo, which adds complexity to the deployment.  That said, Beam's
definition of a job is different from Flink/Spark/etc.  To support this, a
runner would need to support storing arbitrary metadata, so that the Beam
Job Service could store a copy of each Beam job there (pipeline, pipeline
options, etc), either directly as serialized protobuf messages, or by
converting those to json.  Do you know offhand if Flink and Spark support
that kind of arbitrary storage?

-chad

Reply via email to