What runner address you using. Google cloud dataflow uses a closed source version of the pubsub reader as noted in a comment on Read class.
Ankur Chauhan Sent from my iPhone > On May 24, 2017, at 04:05, Josh <[email protected]> wrote: > > Hi all, > > I'm using PubsubIO.Read to consume a Pubsub stream, and my job then writes > the data out to Bigtable. I'm currently seeing a latency of 3-5 seconds > between the messages being published and being written to Bigtable. > > I want to try and decrease the latency to <1s if possible - does anyone have > any tips for doing this? > > I noticed that there is a PubsubGrpcClient > https://github.com/apache/beam/blob/release-2.0.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java > however the PubsubUnboundedSource is initialised with a PubsubJsonClient, so > the Grpc client doesn't appear to be being used. Is there a way to switch to > the Grpc client - as perhaps that would give better performance? > > Also, I am running my job on Dataflow using autoscaling, which has only > allocated one n1-standard-4 instance to the job, which is running at ~50% > CPU. Could forcing a higher number of nodes help improve latency? > > Thanks for any advice, > Josh
