Hi Ankur,

What do you mean by runner address?
Would you be able to link me to the comment you're referring to?

I am using the PubsubIO.Read class from Beam 2.0.0 as found here:
https://github.com/apache/beam/blob/release-2.0.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java

Thanks,
Josh

On Wed, May 24, 2017 at 3:36 PM, Ankur Chauhan <an...@malloc64.com> wrote:

> What runner address you using. Google cloud dataflow uses a closed source
> version of the pubsub reader as noted in a comment on Read class.
>
> Ankur Chauhan
> Sent from my iPhone
>
> On May 24, 2017, at 04:05, Josh <jof...@gmail.com> wrote:
>
> Hi all,
>
> I'm using PubsubIO.Read to consume a Pubsub stream, and my job then writes
> the data out to Bigtable. I'm currently seeing a latency of 3-5 seconds
> between the messages being published and being written to Bigtable.
>
> I want to try and decrease the latency to <1s if possible - does anyone
> have any tips for doing this?
>
> I noticed that there is a PubsubGrpcClient https://github.com/apache/beam
> /blob/release-2.0.0/sdks/java/io/google-cloud-platform/src/
> main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.java however
> the PubsubUnboundedSource is initialised with a PubsubJsonClient, so the
> Grpc client doesn't appear to be being used. Is there a way to switch to
> the Grpc client - as perhaps that would give better performance?
>
> Also, I am running my job on Dataflow using autoscaling, which has only
> allocated one n1-standard-4 instance to the job, which is running at ~50%
> CPU. Could forcing a higher number of nodes help improve latency?
>
> Thanks for any advice,
> Josh
>
>

Reply via email to