Hi all,

I'm using PubsubIO.Read to consume a Pubsub stream, and my job then writes
the data out to Bigtable. I'm currently seeing a latency of 3-5 seconds
between the messages being published and being written to Bigtable.

I want to try and decrease the latency to <1s if possible - does anyone
have any tips for doing this?

I noticed that there is a PubsubGrpcClient https://github.com/apache/
beam/blob/release-2.0.0/sdks/java/io/google-cloud-platform/
src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java
however the PubsubUnboundedSource is initialised with a PubsubJsonClient,
so the Grpc client doesn't appear to be being used. Is there a way to
switch to the Grpc client - as perhaps that would give better performance?

Also, I am running my job on Dataflow using autoscaling, which has only
allocated one n1-standard-4 instance to the job, which is running at ~50%
CPU. Could forcing a higher number of nodes help improve latency?

Thanks for any advice,
Josh

Reply via email to