Hi Ankur, What do you mean by runner address? Would you be able to link me to the comment you're referring to?
I am using the PubsubIO.Read class from Beam 2.0.0 as found here: https://github.com/apache/beam/blob/release-2.0.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java Thanks, Josh On Wed, May 24, 2017 at 3:36 PM, Ankur Chauhan <[email protected]> wrote: > What runner address you using. Google cloud dataflow uses a closed source > version of the pubsub reader as noted in a comment on Read class. > > Ankur Chauhan > Sent from my iPhone > > On May 24, 2017, at 04:05, Josh <[email protected]> wrote: > > Hi all, > > I'm using PubsubIO.Read to consume a Pubsub stream, and my job then writes > the data out to Bigtable. I'm currently seeing a latency of 3-5 seconds > between the messages being published and being written to Bigtable. > > I want to try and decrease the latency to <1s if possible - does anyone > have any tips for doing this? > > I noticed that there is a PubsubGrpcClient https://github.com/apache/beam > /blob/release-2.0.0/sdks/java/io/google-cloud-platform/src/ > main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java however > the PubsubUnboundedSource is initialised with a PubsubJsonClient, so the > Grpc client doesn't appear to be being used. Is there a way to switch to > the Grpc client - as perhaps that would give better performance? > > Also, I am running my job on Dataflow using autoscaling, which has only > allocated one n1-standard-4 instance to the job, which is running at ~50% > CPU. Could forcing a higher number of nodes help improve latency? > > Thanks for any advice, > Josh > >
