Hi all, I'm using PubsubIO.Read to consume a Pubsub stream, and my job then writes the data out to Bigtable. I'm currently seeing a latency of 3-5 seconds between the messages being published and being written to Bigtable.
I want to try and decrease the latency to <1s if possible - does anyone have any tips for doing this? I noticed that there is a PubsubGrpcClient https://github.com/apache/ beam/blob/release-2.0.0/sdks/java/io/google-cloud-platform/ src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java however the PubsubUnboundedSource is initialised with a PubsubJsonClient, so the Grpc client doesn't appear to be being used. Is there a way to switch to the Grpc client - as perhaps that would give better performance? Also, I am running my job on Dataflow using autoscaling, which has only allocated one n1-standard-4 instance to the job, which is running at ~50% CPU. Could forcing a higher number of nodes help improve latency? Thanks for any advice, Josh
