Thank you Israel for the support!

@Fabian, so the solution in your case was to change the Temp location in
the Direct runner to a GCS path?

In our documentation about the runner config we state that it should be a
GCS location for the Dataflow runner, in the Direct runner we do not state
this explicitly. I'll add a note to the docs that when using non-local IO's
such as BigQuery a GCS path is required here.

I have also created a ticket to expose the gcpTempLocation, currently only
the tempLocation is configurable via the UI.

Cheers,
Hans




On Wed, 31 Aug 2022 at 17:10, Israel Herraiz via users <[email protected]>
wrote:

> I am searching in Google, and I cannot find any reference.
>
> I seem to remember that the stacktrace will tell you something like "temp
> location is not in GCS" or something like that.
>
> In any case, that temp location depends on the method used to write to
> BigQuery. The default method in Beam is using FILE_LOADS, which will create
> a BigQuery job (https://cloud.google.com/bigquery/docs/batch-loading-data).
> Those jobs will read data from GCS.
>
> For FILE_LOADS, Beam creates Avro files in the tempLocation of the
> pipeline, and uses the location of those files as an input parameter for
> the BQ job. So it has to be a location in GCS.
>
> Now, tempLocation is used for more things. If you want to use a different
> tempLocation for the rest of the pipeline, you can use the option
> --gcpTempLocation in combination with --tempLocation. BigQueryIO will use
> gcpTempLocation if it is set, and it will fall back to tempLocation if
> gcpTempLocation is not set.
>
> Bear also in mind that if you are using a different write method (e.g.
> STORAGE_WRITE_API), Beam will not generate files, so whether tempLocation
> is in GCS or not does not matter, and the data will be directly written to
> BigQuery (https://cloud.google.com/bigquery/docs/write-api-batch).
>
> These are the write methods that can be used with Beam and BigQuery:
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html
>
> Kind regards,
> Israel
>
>
> On Wed, 31 Aug 2022 at 16:50, Fabian Peters <[email protected]> wrote:
>
>> Hi Israel,
>>
>> That was it, many thanks! I had it set to "${java.io.tmpdir}". Is the
>> requirement to use a GCS location documented somewhere?
>>
>> cheers
>>
>> Fabian
>>
>> Am 31.08.2022 um 11:25 schrieb Israel Herraiz via users <
>> [email protected]>:
>>
>> What are the command line arguments that you are using for those direct
>> runner pipelines? For instance, for BigQuery you will need to set
>> --tempLocation to a GCS location for the BQ jobs to work.
>>
>>
>> On Wed, 31 Aug 2022 at 09:50, Fabian Peters <[email protected]> wrote:
>>
>>> Good morning!
>>>
>>> I'm putting together my Dataflow deployment and am running into another
>>> problem I don't know how to deal with: I'm running a pipeline via Dataflow,
>>> which contains a "Workflow executor" transform. The workflow contains a
>>> number of pipelines that have their run configuration set to Beam-Direct.
>>> In principle, this works fine. (Yeah!)
>>>
>>> However, in this setup a BigQuery Output fails with a
>>> "java.lang.RuntimeException: Failed to create job with prefix
>>> beam_bq_job_LOAD_sites_FOO_ID, reached max retries: 3, last failed job:
>>> null." I see the the same when running just the pipeline (or any other with
>>> BigQuery Output) via Beam-Direct locally, which makes me think that the GCP
>>> credentials are not being picked up? Is there something I need to configure?
>>>
>>> cheers
>>>
>>> Fabian
>>>
>>> P.S.: Logs from running locally with Beam-Direct:
>>>
>>> 2022/08/31 09:30:07 - sites - ERROR: Error starting the Beam pipeline
>>> 2022/08/31 09:30:07 - sites - ERROR:
>>> org.apache.hop.core.exception.HopException:
>>> 2022/08/31 09:30:07 - sites - Error executing pipeline with runner Direct
>>> 2022/08/31 09:30:07 - sites - java.lang.RuntimeException: Failed to
>>> create job with prefix
>>> beam_bq_job_LOAD_sites_65dba39290c04240933e3a982c0c5699_b77cb1586fc969929097729a4a6cdf2a_00001_00000,
>>> reached max retries: 3, last failed job: null.
>>> 2022/08/31 09:30:07 - sites -
>>> 2022/08/31 09:30:07 - sites -   at
>>> org.apache.hop.beam.engines.BeamPipelineEngine.executePipeline(BeamPipelineEngine.java:258)
>>> 2022/08/31 09:30:07 - sites -   at
>>> org.apache.hop.beam.engines.BeamPipelineEngine.lambda$startThreads$0(BeamPipelineEngine.java:305)
>>> 2022/08/31 09:30:07 - sites -   at
>>> java.base/java.lang.Thread.run(Thread.java:829)
>>> 2022/08/31 09:30:07 - sites - Caused by:
>>> org.apache.beam.sdk.Pipeline$PipelineExecutionException:
>>> java.lang.RuntimeException: Failed to create job with prefix
>>> beam_bq_job_LOAD_sites_65dba39290c04240933e3a982c0c5699_b77cb1586fc969929097729a4a6cdf2a_00001_00000,
>>> reached max retries: 3, last failed job: null.
>>> 2022/08/31 09:30:07 - sites -   at
>>> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:373)
>>> 2022/08/31 09:30:07 - sites -   at
>>> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:341)
>>> 2022/08/31 09:30:07 - sites -   at
>>> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:218)
>>> 2022/08/31 09:30:07 - sites -   at
>>> org.apache.hop.beam.engines.BeamPipelineEngine.executePipeline(BeamPipelineEngine.java:246)
>>> 2022/08/31 09:30:07 - sites -   ... 2 more
>>> 2022/08/31 09:30:07 - sites - Caused by: java.lang.RuntimeException:
>>> Failed to create job with prefix
>>> beam_bq_job_LOAD_sites_65dba39290c04240933e3a982c0c5699_b77cb1586fc969929097729a4a6cdf2a_00001_00000,
>>> reached max retries: 3, last failed job: null.
>>> 2022/08/31 09:30:07 - sites -   at
>>> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:199)
>>> 2022/08/31 09:30:07 - sites -   at
>>> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:152)
>>> 2022/08/31 09:30:07 - sites -   at
>>> org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:380)
>>>
>>>
>>

Reply via email to