Hi Chamikara, I've tried both with and without "--experiment=use_beam_bq_sink" and it doesn't seem to affect the outcome.
Regarding "--experiment=beam_fn_api", we're using the portability framework [1] so that we can ship our python runtime environment as an image via "--worker_harness_container_image" instead of maintaining a setup.py file. It's worked very well for us so far (with the exception of this latest issue). [1] Am I using the term right? https://beam.apache.org/documentation/runtime/environments/ On Mon, Jan 13, 2020 at 4:40 PM Chamikara Jayalath <[email protected]> wrote: > > > > On Mon, Jan 13, 2020 at 12:45 PM Bo Shi <[email protected]> wrote: >> >> Python 3.7.5 >> Beam 2.17 >> >> I've used both WriteToBigquery and BigQueryBatchFileLoads successfully >> using DirectRunner. I've boiled the issue down to a small >> reproducible case (attached). The following works great: >> >> $ pipenv run python repro-direct.py \ >> --project=CHANGEME \ >> --runner=DirectRunner \ >> --temp_location=gs://beam-to-bq--tmp/bshi/tmp \ >> --staging_location=gs://beam-to-bq--tmp/bshi/stg \ >> --experiment=beam_fn_api \ # <<<< NB! (I assume this is a no-op w/ Direct >> --save_main_function >> >> $ bq --project=CHANGEME query 'select * from bshi_test.direct' >> Waiting on bqjob_r53488a6b9ed6cb20_0000016fa05fb570_1 ... (0s) Current >> status: DONE >> +----------+----------+ >> | some_str | some_int | >> +----------+----------+ >> | hello | 1 | >> | world | 2 | >> +----------+----------+ >> >> When changing to --runner=DataflowRunner, I get a cryptic Java error >> with no other useful Python feedback (see class_cast_exception.txt) >> >> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: >> Dataflow pipeline failed. State: FAILED, Error: >> java.util.concurrent.ExecutionException: java.lang.ClassCastException: >> [B cannot be cast to org.apache.beam.sdk.values.KV >> >> On a hunch, I removed --experiment=beam_fn_api and DataflowRunner >> works fine. Am I doing something against the intent of the SDK? > > > Can you try adding "--experiment=use_beam_bq_sink" ? > Also, out of curiosity, why are you setting "--experiment=beam_fn_api" ? > > >> >> Happy to share access to the GCP project if that would assist in debugging.
