Optimize Read from BQ in Streaming Mode

Aniruddh Sharma Thu, 23 Apr 2020 06:39:14 -0700

Adding the subject line.


On 2020/04/23 13:38:16, Aniruddh Sharma <[email protected]> wrote: 
> Hello
> 
> I want to read a BQ table which has billions of rows. I am using Streaming 
> mode and using EXORT method. 
> 
> Read is running very slow (seems like in batches) and my job is super slow. 
> Intent of this query is to find what different settings can be applied to 
> maximize the read throughput from BQ.
> 
> a) I notice in BigQueryOptions there are some options to control the 
> concurrency of Writes in BQ, but don't find any such options in READ.  Can 
> there be some settings either in DF or BQ to say to read more data and in 
> parallel in BQ.
> 
> b) I start from numWorkers=10 and maxWorkers=1000, and it constantly runs on 
> 10 workers, Dataflow does not apply autoscaling, somehow it does not 
> determine that it can spin up to 1000 workers and have billion of rows 
> pending to be read and it can spin more machines and read. It doesn't do that.
> 
> Any guidance will help.
> 
> Thanks
> Aniruddh
> 
> 
>

Optimize Read from BQ in Streaming Mode

Reply via email to