Sorry that was an autocorrect error. I meant to ask - what dataflow runner are you using? If you are using google cloud dataflow then the PubsubIO class is not the one doing the reading from the pubsub topic. They provide a custom implementation at run time.
Ankur Chauhan Sent from my iPhone > On May 24, 2017, at 07:52, Josh <[email protected]> wrote: > > Hi Ankur, > > What do you mean by runner address? > Would you be able to link me to the comment you're referring to? > > I am using the PubsubIO.Read class from Beam 2.0.0 as found here: > https://github.com/apache/beam/blob/release-2.0.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java > > Thanks, > Josh > >> On Wed, May 24, 2017 at 3:36 PM, Ankur Chauhan <[email protected]> wrote: >> What runner address you using. Google cloud dataflow uses a closed source >> version of the pubsub reader as noted in a comment on Read class. >> >> Ankur Chauhan >> Sent from my iPhone >> >>> On May 24, 2017, at 04:05, Josh <[email protected]> wrote: >>> >>> Hi all, >>> >>> I'm using PubsubIO.Read to consume a Pubsub stream, and my job then writes >>> the data out to Bigtable. I'm currently seeing a latency of 3-5 seconds >>> between the messages being published and being written to Bigtable. >>> >>> I want to try and decrease the latency to <1s if possible - does anyone >>> have any tips for doing this? >>> >>> I noticed that there is a PubsubGrpcClient >>> https://github.com/apache/beam/blob/release-2.0.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java >>> however the PubsubUnboundedSource is initialised with a PubsubJsonClient, >>> so the Grpc client doesn't appear to be being used. Is there a way to >>> switch to the Grpc client - as perhaps that would give better performance? >>> >>> Also, I am running my job on Dataflow using autoscaling, which has only >>> allocated one n1-standard-4 instance to the job, which is running at ~50% >>> CPU. Could forcing a higher number of nodes help improve latency? >>> >>> Thanks for any advice, >>> Josh >
