What runner address you using. Google cloud dataflow uses a closed source 
version of the pubsub reader as noted in a comment on Read class. 

Ankur Chauhan
Sent from my iPhone

> On May 24, 2017, at 04:05, Josh <[email protected]> wrote:
> 
> Hi all,
> 
> I'm using PubsubIO.Read to consume a Pubsub stream, and my job then writes 
> the data out to Bigtable. I'm currently seeing a latency of 3-5 seconds 
> between the messages being published and being written to Bigtable.
> 
> I want to try and decrease the latency to <1s if possible - does anyone have 
> any tips for doing this? 
> 
> I noticed that there is a PubsubGrpcClient 
> https://github.com/apache/beam/blob/release-2.0.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java
>  however the PubsubUnboundedSource is initialised with a PubsubJsonClient, so 
> the Grpc client doesn't appear to be being used. Is there a way to switch to 
> the Grpc client - as perhaps that would give better performance?
> 
> Also, I am running my job on Dataflow using autoscaling, which has only 
> allocated one n1-standard-4 instance to the job, which is running at ~50% 
> CPU. Could forcing a higher number of nodes help improve latency?
> 
> Thanks for any advice,
> Josh

Reply via email to