Aniruddh - Using BigQueryIO.Read with EXPORT method involves a potentially
long wait on BQ to complete the export.

I have experience with running Dataflow batch jobs which use this read
method to ingest ~10 TB of data in a single job. The behavior I generally
see is that the job will progress through obvious stages. First, it sits at
the initial number of workers for ~30 minutes, and then quickly ramps up to
maxNumWorkers and stays there while it processes data.

That initial 30 minute stage is simply waiting on the BigQuery export job
to complete. Your Beam job has no control over that as it's entirely
BigQuery's responsibility to handle unloading the data into avro files. I
don't think that Beam knows about partial data being available; it will
essentially block further stages of processing until it determines that the
BigQuery export job is complete. Only then does it start reading the avro
files from GCS in parallel and being able to do work.

Reading from BigQuery seems like an awkward fit for a streaming job. Is
this for a static or slowly changing side input for some other streaming
data source?

On Thu, Apr 23, 2020 at 9:38 AM Aniruddh Sharma <[email protected]>
wrote:

> Hello
>
> I want to read a BQ table which has billions of rows. I am using Streaming
> mode and using EXORT method.
>
> Read is running very slow (seems like in batches) and my job is super
> slow. Intent of this query is to find what different settings can be
> applied to maximize the read throughput from BQ.
>
> a) I notice in BigQueryOptions there are some options to control the
> concurrency of Writes in BQ, but don't find any such options in READ.  Can
> there be some settings either in DF or BQ to say to read more data and in
> parallel in BQ.
>
> b) I start from numWorkers=10 and maxWorkers=1000, and it constantly runs
> on 10 workers, Dataflow does not apply autoscaling, somehow it does not
> determine that it can spin up to 1000 workers and have billion of rows
> pending to be read and it can spin more machines and read. It doesn't do
> that.
>
> Any guidance will help.
>
> Thanks
> Aniruddh
>
>
>

Reply via email to