[ 
https://issues.apache.org/jira/browse/BEAM-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Jayalath resolved BEAM-5434.
--------------------------------------
       Resolution: Fixed
    Fix Version/s: Not applicable

> Issue with BigQueryIO in Template
> ---------------------------------
>
>                 Key: BEAM-5434
>                 URL: https://issues.apache.org/jira/browse/BEAM-5434
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>    Affects Versions: 2.5.0
>            Reporter: Amarendra Kumar
>            Assignee: Chamikara Jayalath
>            Priority: Major
>             Fix For: Not applicable
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I am trying to build a google Dataflow template to be run from a cloud 
> function.
> The issue is with BigQueryIO trying execute a SQL.
> The opening step for my Dataflow Template is
> {code:java}
> BigQueryIO.readTableRows().withQueryLocation("US").withoutValidation().fromQuery(options.getSql()).usingStandardSql()
> {code}
> When the template is triggered for the first time its running fine.
> But when its triggered for the second time, it fails with the following error.
> {code}
> // Some comments here
> java.io.FileNotFoundException: No files matched spec: 
> gs://test-notification/temp/Notification/BigQueryExtractTemp/34d42a122600416c9ea748a6e325f87a/000000000000.avro
>       at 
> org.apache.beam.sdk.io.FileSystems.maybeAdjustEmptyMatchResult(FileSystems.java:172)
>       at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:158)
>       at 
> org.apache.beam.sdk.io.FileBasedSource.createReader(FileBasedSource.java:329)
>       at 
> com.google.cloud.dataflow.worker.WorkerCustomSources$1.iterator(WorkerCustomSources.java:360)
>       at 
> com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:177)
>       at 
> com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
>       at 
> com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
>       at 
> com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
>       at 
> com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
>       at 
> com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
>       at 
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
>       at 
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
>       at 
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> In the second run, why is the process expecting a file in the GCS location?
> This file does get created while the job is running at the first run, but it 
> also gets deleted after the job is complete. 
> How are the two jobs related?
>  Could you please let me know if I am missing something or this is a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to