Hi all,

We've been running a few streaming Beam jobs on Dataflow, where each job is
consuming from PubSub via PubSubIO. Each job does something like this:

PubsubIO.readMessagesWithAttributes()
            .withIdAttribute("unique_id")
            .withTimestampAttribute("timestamp");

My understanding of `withTimestampAttribute` is that it means we use the
timestamp on the PubSub message as Beam's concept of time (the watermark) -
so that any windowing we do in the job uses "event time" rather than
"processing time".

My question is: is my understanding correct, and does using
`withTimestampAttribute` have any effect in a job that doesn't do any
windowing? I have a feeling it may also have an effect on Dataflow's
autoscaling, since I think Dataflow scales up when the watermark timestamp
lags behind, but I'm not sure about this.

The reason I'm concerned about this is because we've been using it in all
our Dataflow jobs, and have now realised that whenever
`withTimestampAttribute` is used, Dataflow creates an additional PubSub
subscription (suffixed with `__streaming_dataflow_internal`), which appears
to be doubling PubSub costs (since we pay per subscription)! So I want to
remove `withTimestampAttribute` from jobs where possible, but want to first
understand the implications.

Thanks for any advice,
Josh

Reply via email to