Thank you Israel for the support! @Fabian, so the solution in your case was to change the Temp location in the Direct runner to a GCS path?
In our documentation about the runner config we state that it should be a GCS location for the Dataflow runner, in the Direct runner we do not state this explicitly. I'll add a note to the docs that when using non-local IO's such as BigQuery a GCS path is required here. I have also created a ticket to expose the gcpTempLocation, currently only the tempLocation is configurable via the UI. Cheers, Hans On Wed, 31 Aug 2022 at 17:10, Israel Herraiz via users <[email protected]> wrote: > I am searching in Google, and I cannot find any reference. > > I seem to remember that the stacktrace will tell you something like "temp > location is not in GCS" or something like that. > > In any case, that temp location depends on the method used to write to > BigQuery. The default method in Beam is using FILE_LOADS, which will create > a BigQuery job (https://cloud.google.com/bigquery/docs/batch-loading-data). > Those jobs will read data from GCS. > > For FILE_LOADS, Beam creates Avro files in the tempLocation of the > pipeline, and uses the location of those files as an input parameter for > the BQ job. So it has to be a location in GCS. > > Now, tempLocation is used for more things. If you want to use a different > tempLocation for the rest of the pipeline, you can use the option > --gcpTempLocation in combination with --tempLocation. BigQueryIO will use > gcpTempLocation if it is set, and it will fall back to tempLocation if > gcpTempLocation is not set. > > Bear also in mind that if you are using a different write method (e.g. > STORAGE_WRITE_API), Beam will not generate files, so whether tempLocation > is in GCS or not does not matter, and the data will be directly written to > BigQuery (https://cloud.google.com/bigquery/docs/write-api-batch). > > These are the write methods that can be used with Beam and BigQuery: > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html > > Kind regards, > Israel > > > On Wed, 31 Aug 2022 at 16:50, Fabian Peters <[email protected]> wrote: > >> Hi Israel, >> >> That was it, many thanks! I had it set to "${java.io.tmpdir}". Is the >> requirement to use a GCS location documented somewhere? >> >> cheers >> >> Fabian >> >> Am 31.08.2022 um 11:25 schrieb Israel Herraiz via users < >> [email protected]>: >> >> What are the command line arguments that you are using for those direct >> runner pipelines? For instance, for BigQuery you will need to set >> --tempLocation to a GCS location for the BQ jobs to work. >> >> >> On Wed, 31 Aug 2022 at 09:50, Fabian Peters <[email protected]> wrote: >> >>> Good morning! >>> >>> I'm putting together my Dataflow deployment and am running into another >>> problem I don't know how to deal with: I'm running a pipeline via Dataflow, >>> which contains a "Workflow executor" transform. The workflow contains a >>> number of pipelines that have their run configuration set to Beam-Direct. >>> In principle, this works fine. (Yeah!) >>> >>> However, in this setup a BigQuery Output fails with a >>> "java.lang.RuntimeException: Failed to create job with prefix >>> beam_bq_job_LOAD_sites_FOO_ID, reached max retries: 3, last failed job: >>> null." I see the the same when running just the pipeline (or any other with >>> BigQuery Output) via Beam-Direct locally, which makes me think that the GCP >>> credentials are not being picked up? Is there something I need to configure? >>> >>> cheers >>> >>> Fabian >>> >>> P.S.: Logs from running locally with Beam-Direct: >>> >>> 2022/08/31 09:30:07 - sites - ERROR: Error starting the Beam pipeline >>> 2022/08/31 09:30:07 - sites - ERROR: >>> org.apache.hop.core.exception.HopException: >>> 2022/08/31 09:30:07 - sites - Error executing pipeline with runner Direct >>> 2022/08/31 09:30:07 - sites - java.lang.RuntimeException: Failed to >>> create job with prefix >>> beam_bq_job_LOAD_sites_65dba39290c04240933e3a982c0c5699_b77cb1586fc969929097729a4a6cdf2a_00001_00000, >>> reached max retries: 3, last failed job: null. >>> 2022/08/31 09:30:07 - sites - >>> 2022/08/31 09:30:07 - sites - at >>> org.apache.hop.beam.engines.BeamPipelineEngine.executePipeline(BeamPipelineEngine.java:258) >>> 2022/08/31 09:30:07 - sites - at >>> org.apache.hop.beam.engines.BeamPipelineEngine.lambda$startThreads$0(BeamPipelineEngine.java:305) >>> 2022/08/31 09:30:07 - sites - at >>> java.base/java.lang.Thread.run(Thread.java:829) >>> 2022/08/31 09:30:07 - sites - Caused by: >>> org.apache.beam.sdk.Pipeline$PipelineExecutionException: >>> java.lang.RuntimeException: Failed to create job with prefix >>> beam_bq_job_LOAD_sites_65dba39290c04240933e3a982c0c5699_b77cb1586fc969929097729a4a6cdf2a_00001_00000, >>> reached max retries: 3, last failed job: null. >>> 2022/08/31 09:30:07 - sites - at >>> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:373) >>> 2022/08/31 09:30:07 - sites - at >>> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:341) >>> 2022/08/31 09:30:07 - sites - at >>> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:218) >>> 2022/08/31 09:30:07 - sites - at >>> org.apache.hop.beam.engines.BeamPipelineEngine.executePipeline(BeamPipelineEngine.java:246) >>> 2022/08/31 09:30:07 - sites - ... 2 more >>> 2022/08/31 09:30:07 - sites - Caused by: java.lang.RuntimeException: >>> Failed to create job with prefix >>> beam_bq_job_LOAD_sites_65dba39290c04240933e3a982c0c5699_b77cb1586fc969929097729a4a6cdf2a_00001_00000, >>> reached max retries: 3, last failed job: null. >>> 2022/08/31 09:30:07 - sites - at >>> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJob.runJob(BigQueryHelpers.java:199) >>> 2022/08/31 09:30:07 - sites - at >>> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:152) >>> 2022/08/31 09:30:07 - sites - at >>> org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:380) >>> >>> >>
