[ https://issues.apache.org/jira/browse/BEAM-6206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chamikara Jayalath resolved BEAM-6206. -------------------------------------- Resolution: Fixed Fix Version/s: Not applicable > Dataflow template which reads from BigQuery fails if used more than once > ------------------------------------------------------------------------ > > Key: BEAM-6206 > URL: https://issues.apache.org/jira/browse/BEAM-6206 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, runner-dataflow > Affects Versions: 2.8.0 > Reporter: Neil McCrossin > Assignee: Chamikara Jayalath > Priority: Major > Fix For: Not applicable > > Time Spent: 4h > Remaining Estimate: 0h > > When a pipeline contains a BigQuery read, and when that pipeline is uploaded > as a template and the template is run in Cloud Dataflow, it will run > successfully the first time, but after that it will fail because it can't > find a file in the folder BigQueryExtractTemp (see error message below). If > the template is uploaded again it will work again +once only+ and then fail > again every time after the first time. > *Error message:* > java.io.FileNotFoundException: No files matched spec: > gs://bigquery-bug-report-4539/temp/BigQueryExtractTemp/847a342637a64e73b126ad33f764dcc9/000000000000.avro > *Steps to reproduce:* > 1. Create the Beam Word Count sample as described > [here|https://cloud.google.com/dataflow/docs/quickstarts/quickstart-java-maven]. > 2. Copy the command line from the section "Run WordCount on the Cloud > Dataflow service" and substitute in your own project id and bucket name. Make > sure you can run it successfully. > 3. In the file WordCount.java, add the following lines below the existing > import statements: > {code:java} > import org.apache.beam.sdk.coders.AvroCoder; > import org.apache.beam.sdk.coders.DefaultCoder; > import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO; > import org.apache.beam.sdk.io.gcp.bigquery.SchemaAndRecord; > import org.apache.beam.sdk.transforms.SerializableFunction; > @DefaultCoder(AvroCoder.class) > class TestOutput > { > } > {code} > > 4. In this same file, replace the entire method runWordCount with the > following code: > {code:java} > static void runWordCount(WordCountOptions options) { > Pipeline p = Pipeline.create(options); > p.apply("ReadBigQuery", BigQueryIO > .read(new SerializableFunction<SchemaAndRecord, TestOutput>() { > public TestOutput apply(SchemaAndRecord record) { > return new TestOutput(); > } > }) > .from("bigquery-public-data:stackoverflow.tags") > ); > p.run(); > } > {code} > (Note I am using the stackoverflow.tags table for purposes of demonstration > because it is public and not too large, but the problem seems to occur for > any table). > 5. Add the following pipeline parameters to the command line that you have > been using: > {code:java} > --tempLocation=gs://<STORAGE_BUCKET>/temp/ > --templateLocation=gs://<STORAGE_BUCKET>/my-bigquery-dataflow-template > {code} > 6. Run the command line so that the template is created. > 7. Launch the template through the Cloud Console by clicking on "CREATE JOB > FROM TEMPLATE". Give it the job name "test-1", choose "Custom Template" at > the bottom of the list and browse to the template > "my-bigquery-dataflow-template", then press "Run job". > 8. The job should succeed. But then repeat step 7 and it will fail. > 9. Repeat steps 6 and 7 and it will work again. Repeat step 7 and it will > fail again. > > This bug may be related to BEAM-2058 (just a hunch). -- This message was sent by Atlassian JIRA (v7.6.3#76005)