Aniruddh Sharma Thu, 23 Apr 2020 06:39:15 -0700

Hello

I want to read a BQ table which has billions of rows. I am using Streaming mode 
and using EXORT method.


Read is running very slow (seems like in batches) and my job is super slow. 
Intent of this query is to find what different settings can be applied to 
maximize the read throughput from BQ.

a) I notice in BigQueryOptions there are some options to control the 
concurrency of Writes in BQ, but don't find any such options in READ.  Can 
there be some settings either in DF or BQ to say to read more data and in 
parallel in BQ.

b) I start from numWorkers=10 and maxWorkers=1000, and it constantly runs on 10 
workers, Dataflow does not apply autoscaling, somehow it does not determine 
that it can spin up to 1000 workers and have billion of rows pending to be read 
and it can spin more machines and read. It doesn't do that.

Any guidance will help.

Thanks
Aniruddh

Reply via email to